com.github.monnetproject.bliss.sparsemath

com.github.monnetproject.bliss.sparsemath from the Monnet Project's bliss project.

License

License

Categories

Categories

Net
GroupId

GroupId

com.github.monnetproject
ArtifactId

ArtifactId

bliss.sparsemath
Last Version

Last Version

1.18.4
Release Date

Release Date

Type

Type

jar
Description

Description

com.github.monnetproject.bliss.sparsemath
com.github.monnetproject.bliss.sparsemath from the Monnet Project's bliss project.
Project URL

Project URL

https://github.com/monnetproject/bliss
Source Code Management

Source Code Management

http://github.com/monnetproject/bliss/tree/master

Download bliss.sparsemath

How to add to project

<!-- https://jarcasting.com/artifacts/com.github.monnetproject/bliss.sparsemath/ -->
<dependency>
    <groupId>com.github.monnetproject</groupId>
    <artifactId>bliss.sparsemath</artifactId>
    <version>1.18.4</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.monnetproject/bliss.sparsemath/
implementation 'com.github.monnetproject:bliss.sparsemath:1.18.4'
// https://jarcasting.com/artifacts/com.github.monnetproject/bliss.sparsemath/
implementation ("com.github.monnetproject:bliss.sparsemath:1.18.4")
'com.github.monnetproject:bliss.sparsemath:jar:1.18.4'
<dependency org="com.github.monnetproject" name="bliss.sparsemath" rev="1.18.4">
  <artifact name="bliss.sparsemath" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.monnetproject', module='bliss.sparsemath', version='1.18.4')
)
libraryDependencies += "com.github.monnetproject" % "bliss.sparsemath" % "1.18.4"
[com.github.monnetproject/bliss.sparsemath "1.18.4"]

Dependencies

compile (2)

Group / Artifact Type Version
junit : junit jar 4.10
it.unimi.dsi : fastutil jar 6.4.4

test (3)

Group / Artifact Type Version
org.apache.commons : commons-math jar 2.2
org.apache.commons : commons-compress jar 1.4.1
colt : colt jar 1.2.0

Project Modules

There are no modules declared in this project.

Bilingual Similarity Suite (BLISS)

This package provides a set of tools for working with topic modelling and in particular in the cross-lingual case, and for application to machine translation. The following algorithms are implemented

  • Latent Dirichlet Allocation
  • Cross-Lingual Explicit Semantic Analysis

And the following are planned

  • Kernel Explicit Semantic Analysis
  • Latent Semantic Analysis
  • Coupled Probabilistic Latent Semantic Analysis

Building

Translation Topics uses Maven to build, and can be simply installed with the following command

mvn install

Building a corpus

To build a corpus for this there are existing scripts that download the data from Wikipedia. These can be run with (for English to German)

./build-wikipedia-article.sh en de

Mate-finding trials

Mate-finding trials can be run with the following command, from the experiments sub-folder:

mvn exec:java -Dexec.mainClass=eu.monnetproject.bliss.experiments.MateFindingTrial 
       -Dexec.args="trainFile metricFactory W testFile"

Where W is the number of distinct tokens in the corpus and metricFactory is:

  • eu.monnetproject.bliss.clesa.CLESA: For CL-ESA
  • (More to come)

Language model adaptation

Language models can be trained with the following command (from the betalm folder)

mvn exec:java -Dexec.mainClass="betalm.compile" -Dexec.args="corpus.gz N wordMap W lmFile"

Where N is the order of the n-gram model and W the number of distinct tokens. To adapt to a specific document provide in addition to -Dexec.args the following flags

    -Dexec.args="-b METHOD -f file[.gz] ..." 

Where METHOD is one of

  • COS_SIM
  • NORMAL_COS_SIM
  • KLD
  • JACCARD
  • DICE
  • ROGERS_TANIMOTO
  • DF_JACCARD
  • DF_DICE
  • WxWCLESA
com.github.monnetproject

Monnet Project

Versions

Version
1.18.4