exude

exude library will filter the stopping/stemming/swearing words from file/text.

License

License

GroupId

GroupId

com.uttesh
ArtifactId

ArtifactId

exude
Last Version

Last Version

0.0.4
Release Date

Release Date

Type

Type

jar
Description

Description

exude
exude library will filter the stopping/stemming/swearing words from file/text.
Project URL

Project URL

http://maven.apache.org
Source Code Management

Source Code Management

https://github.com/uttesh/exude

Download exude

How to add to project

<!-- https://jarcasting.com/artifacts/com.uttesh/exude/ -->
<dependency>
    <groupId>com.uttesh</groupId>
    <artifactId>exude</artifactId>
    <version>0.0.4</version>
</dependency>
// https://jarcasting.com/artifacts/com.uttesh/exude/
implementation 'com.uttesh:exude:0.0.4'
// https://jarcasting.com/artifacts/com.uttesh/exude/
implementation ("com.uttesh:exude:0.0.4")
'com.uttesh:exude:jar:0.0.4'
<dependency org="com.uttesh" name="exude" rev="0.0.4">
  <artifact name="exude" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.uttesh', module='exude', version='0.0.4')
)
libraryDependencies += "com.uttesh" % "exude" % "0.0.4"
[com.uttesh/exude "0.0.4"]

Dependencies

compile (5)

Group / Artifact Type Version
org.apache.tika : tika-core jar 1.11
org.apache.tika : tika-parsers jar 1.11
org.apache.tika : tika-xmp jar 1.11
org.testng : testng jar 6.9.10
org.slf4j : slf4j-api jar 1.7.21

test (4)

Group / Artifact Type Version
org.assertj : assertj-core jar 3.5.2
org.jmockit : jmockit jar 1.28
ch.qos.logback : logback-classic jar 1.1.7
org.slf4j : jul-to-slf4j jar 1.7.21

Project Modules

There are no modules declared in this project.

exude demo

Maven Central Maven Central Build Status

This is simple library for removing/filtering the stopping,stemming words from the text data, this library is in very basic level of development need to work on for later changes.

This is the part of maven repository now,Directly add in pom following.

    <dependency>
        <groupId>com.uttesh</groupId>
        <artifactId>exude</artifactId>
        <version>0.0.4</version>
    </dependency>

How to use exude Library


Download latest version of exude download

Features:

  • Filter stopping words from given text/file/link
  • Filter stemming words from given text/file/link
  • Get swear words from given text/file/link

How Exude library works:

Step 1: Filter the duplicate words from the input data/file.
Step 2: Filter the stopping words from step1 filtered data.
Step 3: Filter the stemmer/swear words from step2 filtered data using the Porter algorithm which is used for suffix stripping.

exude process sequence flow:

demo

Environment and dependent jar file


  1. Minimum JDK 1.6 or higher
  2. Apache Tika jar (which is used to parse the files for the data extraction)

Sample code:

Sample Text Data

 String inputData = "Kannada is a Southern Dravidian language, and according to Dravidian scholar Sanford Steever, its history can be conventionally divided into three periods; Old Kannada (halegannada) from 450–1200 A.D., Middle Kannada (Nadugannada) from 1200–1700 A.D., and Modern Kannada from 1700 to the present.[20] Kannada is influenced to an appreciable extent by Sanskrit. Influences of other languages such as Prakrit and Pali can also be found in Kannada language.";
 String output = ExudeData.getInstance().filterStoppings(inputData);
 System.out.println("output : "+output);

Sample File Data

String inputData = "any file path";
String output = ExudeData.getInstance().filterStoppings(inputData);
System.out.println("output : "+output);

Sample Link Data

String inputData = "https://en.wikipedia.org/wiki/Rama";
String output = ExudeData.getInstance().filterStoppings(inputData);
System.out.println("output : "+output);

Get swear words from data/file/link

String inputData = "enter text with bad words";
String output = ExudeData.getInstance().getSwearWords(inputData);
System.out.println("output : "+output);

New Feature:

  1. Keep the duplicate words after the filterStoppings

Sample Text Data

 String inputData = "testing testing testing the keep keep the the duplicate data data in result";
 String output = ExudeData.getInstance().filterStoppingsKeepDuplicates(inputData);
 System.out.println("output : "+output);

contributions

Credit apache tika which is used to parse the files for the data extraction.

Exude library Developer : uttesh.com

License

(The Apache License)

Copyright (c) 2018 Uttesh Kumar T.H.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Versions

Version
0.0.4
0.0.3
0.0.2
0.0.1