pLSA in Java

Java implementation of probabilistic latent semantic analysis (pLSA)

License

License

MIT
Categories

Categories

Java Languages
GroupId

GroupId

com.github.chen0040
ArtifactId

ArtifactId

java-plsa
Last Version

Last Version

1.0.1
Release Date

Release Date

Type

Type

jar
Description

Description

pLSA in Java
Java implementation of probabilistic latent semantic analysis (pLSA)
Project URL

Project URL

https://github.com/chen0040/java-plsa
Source Code Management

Source Code Management

https://github.com/chen0040/java-plsa

Download java-plsa

How to add to project

<!-- https://jarcasting.com/artifacts/com.github.chen0040/java-plsa/ -->
<dependency>
    <groupId>com.github.chen0040</groupId>
    <artifactId>java-plsa</artifactId>
    <version>1.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.chen0040/java-plsa/
implementation 'com.github.chen0040:java-plsa:1.0.1'
// https://jarcasting.com/artifacts/com.github.chen0040/java-plsa/
implementation ("com.github.chen0040:java-plsa:1.0.1")
'com.github.chen0040:java-plsa:jar:1.0.1'
<dependency org="com.github.chen0040" name="java-plsa" rev="1.0.1">
  <artifact name="java-plsa" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.chen0040', module='java-plsa', version='1.0.1')
)
libraryDependencies += "com.github.chen0040" % "java-plsa" % "1.0.1"
[com.github.chen0040/java-plsa "1.0.1"]

Dependencies

compile (2)

Group / Artifact Type Version
com.github.chen0040 : java-data-text jar 1.0.3
com.github.chen0040 : java-data-frame jar 1.0.2

provided (1)

Group / Artifact Type Version
org.projectlombok : lombok jar 1.16.6

test (10)

Group / Artifact Type Version
org.testng : testng jar 6.9.10
org.hamcrest : hamcrest-core jar 1.3
org.hamcrest : hamcrest-library jar 1.3
org.assertj : assertj-core jar 3.5.2
org.powermock : powermock-core jar 1.6.5
org.powermock : powermock-api-mockito jar 1.6.5
org.powermock : powermock-module-junit4 jar 1.6.5
org.powermock : powermock-module-testng jar 1.6.5
org.mockito : mockito-core jar 2.0.2-beta
org.mockito : mockito-all jar 2.0.2-beta

Project Modules

There are no modules declared in this project.

java-plsa

Package provides the java implementation of scoreabilistic latent semantic analysis (pLSA)

Build Status Coverage Status

Install

Add the following dependency to your POM file:

<dependency>
  <groupId>com.github.chen0040</groupId>
  <artifactId>java-plsa</artifactId>
  <version>1.0.1</version>
</dependency>

Usage

The sample code belows illustrates how to perform topic modelling using pLSA

List<String> docs = Arrays.asList("[doc-1-content]", "[doc-2-content]", ...);

pLSA method = new pLSA();
method.setStemmerEnabled(true);

method.setMaxIters(10);
method.setMaxVocabularySize(1000);
method.fit(docs);

for(int topic = 0; topic < method.getTopicCount(); ++topic){
 List<TupleTwo<Document, Double>> topRankedDocs = method.getTopRankingDocs4Topic(topic, 3);
 List<TupleTwo<String, Double>> topRankedWords = method.getTopRankingWords4Topic(topic, 3);

 System.out.println("Topic "+topic+": ");

 System.out.println("Top Ranked Document:");
 for(TupleTwo<Document, Double> entry : topRankedDocs){
    Document doc = entry._1();
    double score = entry._2();
    System.out.print(doc.docIndex()+"(" + score +"), ");
    System.out.println(doc.content());
 }
 System.out.println();

 System.out.println("Top Ranked Words:");
 for(TupleTwo<String, Double> entry : topRankedWords){
    String word = entry._1();
    double score = entry._2();
    System.out.print(word+"(" + score +"), ");
 }
 System.out.println();
}

System.out.println("// ============================================= //");

for(int doc = 0; doc < method.getDocCount(); ++doc){
 List<TupleTwo<Integer, Double>> topRankedTopics = method.getTopRankingTopics4Doc(doc, 3);
 System.out.print("Doc "+doc+": ");
 for(TupleTwo<Integer, Double> entry : topRankedTopics){
    int topic = entry._1();
    double score = entry._2();
    System.out.print(topic+"(" + score + "), ");
 }
 System.out.println();
}

Versions

Version
1.0.1