com.cldellow:segmenter

Segment strings into words.

License

License

GroupId

GroupId

com.cldellow
ArtifactId

ArtifactId

segmenter
Last Version

Last Version

0.0.3
Release Date

Release Date

Type

Type

jar
Description

Description

com.cldellow:segmenter
Segment strings into words.
Project URL

Project URL

https://github.com/cldellow/segmenter
Source Code Management

Source Code Management

https://github.com/cldellow/segmenter/tree/master

Download segmenter

How to add to project

<!-- https://jarcasting.com/artifacts/com.cldellow/segmenter/ -->
<dependency>
    <groupId>com.cldellow</groupId>
    <artifactId>segmenter</artifactId>
    <version>0.0.3</version>
</dependency>
// https://jarcasting.com/artifacts/com.cldellow/segmenter/
implementation 'com.cldellow:segmenter:0.0.3'
// https://jarcasting.com/artifacts/com.cldellow/segmenter/
implementation ("com.cldellow:segmenter:0.0.3")
'com.cldellow:segmenter:jar:0.0.3'
<dependency org="com.cldellow" name="segmenter" rev="0.0.3">
  <artifact name="segmenter" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.cldellow', module='segmenter', version='0.0.3')
)
libraryDependencies += "com.cldellow" % "segmenter" % "0.0.3"
[com.cldellow/segmenter "0.0.3"]

Dependencies

compile (1)

Group / Artifact Type Version
com.hankcs : aho-corasick-double-array-trie jar 1.2.1

test (1)

Group / Artifact Type Version
junit : junit jar 4.12

Project Modules

There are no modules declared in this project.

segmenter

Build Status codecov Maven Central

Segment short strings into words.

Usage

The easiest way to get started is to create a map of word probabilities:

HashMap<String, Double> probabilities = new HashMap<String, Double>();
probabilities.put("eats", 0.2);
probabilities.put("at", 0.2);
probabilities.put("eat", 0.1);
probabilities.put("sat", 0.1);

Segmenter segmenter = new Segmenter(probabilities);
Result result = segmenter.segment("eatsat", 2, 2, 0);

result.getPhrase(0); // "eats at"
result.getPhrase(1); // "eat sat"

Under the covers, the Segmenter converts the map into a trie. The construction step is slow, so you can also pass a constructed trie (perhaps deserialized from a previous construction) to speed up that step.

The Segmenter class is thread-safe.

Versions

Version
0.0.3
0.0.2
0.0.1