es-ik

Kind of Chinese Analysis for Elasticsearch

License

License

GroupId

GroupId

io.github.zacker330.es
ArtifactId

ArtifactId

ik-analysis-core
Last Version

Last Version

1.0.0
Release Date

Release Date

Type

Type

jar
Description

Description

es-ik
Kind of Chinese Analysis for Elasticsearch
Project URL

Project URL

https://github.com/zacker330/es-ik
Source Code Management

Source Code Management

https://github.com/zacker330/es-ik

Download ik-analysis-core

How to add to project

<!-- https://jarcasting.com/artifacts/io.github.zacker330.es/ik-analysis-core/ -->
<dependency>
    <groupId>io.github.zacker330.es</groupId>
    <artifactId>ik-analysis-core</artifactId>
    <version>1.0.0</version>
</dependency>
// https://jarcasting.com/artifacts/io.github.zacker330.es/ik-analysis-core/
implementation 'io.github.zacker330.es:ik-analysis-core:1.0.0'
// https://jarcasting.com/artifacts/io.github.zacker330.es/ik-analysis-core/
implementation ("io.github.zacker330.es:ik-analysis-core:1.0.0")
'io.github.zacker330.es:ik-analysis-core:jar:1.0.0'
<dependency org="io.github.zacker330.es" name="ik-analysis-core" rev="1.0.0">
  <artifact name="ik-analysis-core" type="jar" />
</dependency>
@Grapes(
@Grab(group='io.github.zacker330.es', module='ik-analysis-core', version='1.0.0')
)
libraryDependencies += "io.github.zacker330.es" % "ik-analysis-core" % "1.0.0"
[io.github.zacker330.es/ik-analysis-core "1.0.0"]

Dependencies

runtime (1)

Group / Artifact Type Version
ch.qos.logback : logback-classic jar 1.1.3

test (1)

Group / Artifact Type Version
junit : junit jar 4.12

Project Modules

There are no modules declared in this project.

Kind of Chinese Analysis for Elasticsearch Build Status

Requirements

- Java 7 update 55 or later

Structure of es-ik

  • ik-analysis-core

    The algorithm of this module is coming from ik-analyzer. In principle, you can use this module to implement a Solor analyzer plugin or a Elasticsearch plugin.

    You just need implement DictionaryConfiguration interface to provide dictionary content which is used by analysing content process.

  • ik-analysis-es-plugin:

    Integrate with ik-analyzer-core module and Elasticsearch. Define a kind of SPI which is Configuration extends DictionaryConfiguration

  • es-ik-sqlite3

    Persist dictionary's content into Sqlite3 database. This module is a kind of service provider to SPI Configuration defined in ik-analysis-es-plugin.

How to use es-ik

Actually, ik-analysis-es-plugin expose a interface DictionaryConfiguration a kind of SPI. es-ik-sqlite3 implement it so that ik-analysis-es-plugin can get dictionary's content from Sqlite. In other words, you can get your implementation like persisting dictionary's content into Redis.

SPI is just a kind of concept. In java, I use ServiceLoader to implement that. As soon as your implementation conforms with ServiceLoader's usage, don't need to change ik-analysis-es-plugin module, you'll get a new ik-analysis-es-plugin's plugin. :P

How to use es-ik-sqlite3(currently version 1.0.1)

  • tell elasticsearch where is you sqlite3 db, add a configuration into your elasticsearch.yml, like:

      ik_analysis_db_path: /opt/ik/dictionary.db
    

    PS: you can download my dictionary.db from https://github.com/zacker330/es-ik-sqlite3-dictionary

  • get in you elasticsearch folder then install plugin:

      ./bin/plugin -i ik-analysis -u https://github.com/zacker330/es-ik-plugin-sqlite3-release/raw/master/es-ik-sqlite3-1.0.1.zip
    
  • test your configuration:

  1. create songs index

     curl -X PUT -H "Cache-Control: no-cache" -d '{
         "settings":{
             "index":{
                 "number_of_shards":1,
                 "number_of_replicas": 1
             }
         }
     }' 'http://localhost:9200/songs/'
    
  2. create map for songs/song

     curl -X PUT -H "Cache-Control: no-cache" -d '{
             "song": {
                 "_source": {"enabled": true},
                 "_all": {
                     "indexAnalyzer": "ik_analysis",
                     "searchAnalyzer": "ik_analysis",
                     "term_vector": "no",
                     "store": "true"
                 },
                 "properties":{
                     "title":{
                         "type": "string",
                         "store": "yes",
                         "indexAnalyzer": "ik_analysis",
                         "searchAnalyzer": "ik_analysis",
                         "include_in_all": "true"
                     }
                 }
    
             }
     }
         ' 'http://localhost:9200/songs/_mapping/song'
    
  3. test it

     curl -X POST  -d '林夕为我们作词' 'http://localhost:9200/songs/_analyze?analyzer=ik_analysis'
    
     response:
     {"tokens":[{"token":"林夕","start_offset":0,"end_offset":2,"type":"CN_WORD","position":1},{"token":"作词","start_offset":5,"end_offset":7,"type":"CN_WORD","position":2}]}
    

Create a empty sqlite3 db for es-ik-sqlite3

  1. create database

     sqlite3 dictionary.db
    
  2. create tables

     CREATE TABLE main_dictionary(term TEXT NOT NULL,unique(term));
     CREATE TABLE quantifier_dictionary(term TEXT NOT NULL,unique(term));
     CREATE TABLE stopword_dictionary(term TEXT NOT NULL,unique(term));
    

617052 records ~= 30MB db file

Versions

Version
1.0.0