BioMedICUS Tokenizer

A lightweight (small and dependency-free) Java 8 library for Penn-like tokenization. This was developed as a stand-alone component of BioMedICUS, a biomedical and clinical NLP engine developed by the NLP-IE Group at the University of Minnesota Institute for Health Informatics.

License	License Apache License, Version 2.0
GroupId	GroupId edu.umn.biomedicus
ArtifactId	ArtifactId biomedicus-tokenizer
Last Version	Last Version 0.0.3
Release Date	Release Date Jan 5, 2019
Type	Type jar
Description	Description BioMedICUS Tokenizer A lightweight (small and dependency-free) Java 8 library for Penn-like tokenization. This was developed as a stand-alone component of BioMedICUS, a biomedical and clinical NLP engine developed by the NLP-IE Group at the University of Minnesota Institute for Health Informatics.
Project URL	Project URL https://github.com/nlpie/biomedicus-tokenizer
Project Organization	Project Organization University of Minnesota Institute for Health Informatics NLP/IE Program
Source Code Management	Source Code Management https://github.com/nlpie/biomedicus-tokenizer

Download biomedicus-tokenizer

Filename	Size
biomedicus-tokenizer-0.0.3.pom
biomedicus-tokenizer-0.0.3.jar	10 KB
biomedicus-tokenizer-0.0.3-sources.jar	9 KB
biomedicus-tokenizer-0.0.3-javadoc.jar	32 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/edu.umn.biomedicus/biomedicus-tokenizer/ -->
<dependency>
    <groupId>edu.umn.biomedicus</groupId>
    <artifactId>biomedicus-tokenizer</artifactId>
    <version>0.0.3</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/edu.umn.biomedicus/biomedicus-tokenizer/
implementation 'edu.umn.biomedicus:biomedicus-tokenizer:0.0.3'

Gradle Kotlin

// https://jarcasting.com/artifacts/edu.umn.biomedicus/biomedicus-tokenizer/
implementation ("edu.umn.biomedicus:biomedicus-tokenizer:0.0.3")

Apache Buildr

'edu.umn.biomedicus:biomedicus-tokenizer:jar:0.0.3'

Apache Ivy

<dependency org="edu.umn.biomedicus" name="biomedicus-tokenizer" rev="0.0.3">
  <artifact name="biomedicus-tokenizer" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='edu.umn.biomedicus', module='biomedicus-tokenizer', version='0.0.3')
)

Scala SBT

libraryDependencies += "edu.umn.biomedicus" % "biomedicus-tokenizer" % "0.0.3"

Leiningen

[edu.umn.biomedicus/biomedicus-tokenizer "0.0.3"]

Dependencies

compile (2)

Group / Artifact	Type	Version
org.slf4j : slf4j-api	jar	1.7.25
com.google.code.findbugs : jsr305 Optional	jar	3.0.2

test (3)

Group / Artifact	Type	Version
org.junit.jupiter : junit-jupiter-engine	jar	5.3.2
org.mockito : mockito-core	jar	2.23.4
org.slf4j : slf4j-nop	jar	1.7.25

Project Modules

There are no modules declared in this project.

BioMedICUS Tokenizer

Using in your project

To use in a maven project, include the following in your pom:

<dependencies>
  <dependency>
    <groupId>edu.umn.biomedicus</groupId>
    <artifactId>biomedicus-tokenization</artifactId>
    <version>0.0.3</version>
  </dependency>
</dependencies>

Alternatively, download the .jar and include that in your libraries.

Detecting tokens from strings

Iteratively

import edu.umn.biomedicus.tokenization.Tokenizer;
import edu.umn.biomedicus.tokenization.TokenResult;

public class Example {
  public void example() {
    String text = "An example sentence.";
    for (TokenResult result : Tokenizer.tokenize(text)) {
      CharSequence tokenText = result.text(text);
    }
  }
}

All at once

import edu.umn.biomedicus.tokenization.Tokenizer;
import edu.umn.biomedicus.tokenization.TokenResult;

public class Example {
  public void example() {
    String text = "An example sentence.";
    List<TokenResult> results = Tokenizer.allTokens(text);
    for (TokenResult result : results) {
      CharSequence tokenText = result.text(text);
    }
  }
}

Javadoc

You can find the api documentation for this project here

Contact and Support

For issues or enhancement requests, feel free to submit to the Issues tab on GitHub.

BioMedICUS has a gitter chat and a Google Group for contacting developers with questions, suggestions or feedback.

About Us

BioMedICUS is developed by the University of Minnesota Institute for Health Informatics NLP/IE Group with assistance from the Open Health Natural Language Processing (OHNLP) Consortium.

Contributing

Anyone is welcome and encouraged to contribute. If you discover a bug, or think the project could use an enhancement, follow these steps:

Create an issue and offer to code a solution. We can discuss the issue and decide whether any code would be a good addition to the project.
Fork the project. [https://github.com/nlpie/biomedicus-tokenizer/fork]
Create Feature branch (git checkout -b feature-name)
Code your solution.

Follow the Google style guide for Java. There are IDE profiles available here.
Write unit tests for any non-trivial aspects of your code. If you are fixing a bug write a regression test: one that confirms the behavior you fixed stays fixed.

Commit to branch. (git commit -am 'Summary of changes')
Push to GitHub (git push origin feature-name)
Create a pull request on this repository from your forked project. We will review and discuss your code and merge it.

Natural Language Processing / Information Extraction (NLP/IE) Program

The Natural Language Processing / Information Extraction (NLP/IE) Program at the University of Minnesota Institute for Health Informatics

Versions

Version
0.0.3 Jan 5, 2019
0.0.2 Jan 4, 2019
0.0.1 Mar 29, 2018

BioMedICUS Tokenizer

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Project Organization

Source Code Management

Download biomedicus-tokenizer

How to add to project

Dependencies

compile (2)

test (3)

Project Modules

BioMedICUS Tokenizer

Using in your project

Detecting tokens from strings

Iteratively

All at once

Javadoc

Contact and Support

About Us

Contributing

Natural Language Processing / Information Extraction (NLP/IE) Program

Versions