age-predictor

Ensemble Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum using Apache OpenNLP, and Apache Spark.

License	License The Apache License, Version 2.0
GroupId	GroupId edu.usc.ir
ArtifactId	ArtifactId age-predictor
Last Version	Last Version 1.0
Release Date	Release Date Jul 6, 2017
Type	Type pom
Description	Description age-predictor Ensemble Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum using Apache OpenNLP, and Apache Spark.
Project URL	Project URL http://maven.apache.org
Source Code Management	Source Code Management https://github.com/USCDataScience/AgePredictor.git

Download age-predictor

Filename	Size
age-predictor-1.0.pom	5 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/edu.usc.ir/age-predictor/ -->
<dependency>
    <groupId>edu.usc.ir</groupId>
    <artifactId>age-predictor</artifactId>
    <version>1.0</version>
    <type>pom</type>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/edu.usc.ir/age-predictor/
implementation 'edu.usc.ir:age-predictor:1.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/edu.usc.ir/age-predictor/
implementation ("edu.usc.ir:age-predictor:1.0")

Apache Buildr

'edu.usc.ir:age-predictor:pom:1.0'

Apache Ivy

<dependency org="edu.usc.ir" name="age-predictor" rev="1.0">
  <artifact name="age-predictor" type="pom" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='edu.usc.ir', module='age-predictor', version='1.0')
)

Scala SBT

libraryDependencies += "edu.usc.ir" % "age-predictor" % "1.0"

Leiningen

[edu.usc.ir/age-predictor "1.0"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

age-predictor-opennlp
age-predictor-cli
age-predictor-api
age-predictor-assembly

Author Age Prediction

This is a author age categorizer that leverages the Apache OpenNLP Maximum Entropy Classifier. It takes a text sample and classifies it into the following age categories: xx-18|18-24|25-34|35-49|50-64|65-xx.

Usage

How to train an Age Classifier

Note: The training data should be a line-by-line, with each line starting with the age, or age category, followed by a tab and the text associated with the age.

Usage: bin/authorage AgeClassifyTrainer [-factory factoryName] [-featureGenerators featuregens] [-tokenizer tokenizer] -model modelFile [-params paramsFile] -lang language -data sampleData [-encoding charsetName]

Arguments description:
	-factory factoryName
        a sub-class of DoccatFactory where to get implementation and resources.
	-featureGenerators featuregens
	    comma separated feature generator classes. Bag of words default.
	-tokenizer tokenizer
        tokenizer implementation. WhitespaceTokenizer is used if not specified.
	-model modelFile
        output model file.
	-params paramsFile
	    training parameters file.
	-lang language
	    language which is being processed.
	-data sampleData
	    data to be used, usually a file name.
	-encoding charsetName
	    encoding for reading and writing text, if absent the system default is used.

Example Usage:

bin/authorage AgeClassifyTrainer -model model/en-ageClassify.bin -lang en -data data/train.txt -encoding UTF-8

Training data format - Age and text seperated by tab in each line like <AGE><Tab><TEXT>
Sample training data-

12	I am just 12 year old
25	I am little bigger
35	I am mature
45	I am getting old
60	I am old like wine

How to evaluate an Age Classifier Model

Usage: bin/authorage AgeClassifyEvaluator -model model [-misclassified true|false] -data sampleData [-encoding charsetName]

Arguments description:
	-model model
		the model file to be evaluated.
	-misclassified true|false
		if true will print false negatives and false positives.
	-data sampleData
		data to be used, usually a file name.
	-encoding charsetName
		encoding for reading and writing text, if absent the system default is used.

Example Usage:

bin/authorage AgeClassifyEvaluator -model model/en-ageClassify.bin -data data/test.txt -encoding UTF-8

How to run the Age Classifier

Note: Each document must be followed by an empty line to be detected as a separate case from the others.

Usage: bin/authorage AgeClassify model < documents

Usage: bin/authorage AgePredict ./model/classify-unigram.bin ./model/regression-global.bin  data/sample_test.txt

Downloads

For AgePredict to work you need to download en-pos-maxent.bin, en-sent.bin and en-token.bin from http://opennlp.sourceforge.net/models-1.5/ to model/opennlp/

Citation:

If you use this work, please cite:

@article{hong2017ensemble,
  title={Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction},
  author={Hong, Joey and Mattmann, Chris and Ramirez, Paul},
  booktitle={Information Reuse and Integration (IRI), 2017 IEEE 18th International Conference on},
  organization={IEEE}
  year={2017}
}

Contributors

Chris A. Mattmann, JPL & USC
Joey Hong, Caltech
Madhav Sharan, JPL & USC

License

Apache License, version 2

USC Information Retrieval & Data Science

USC Information Retrieval and Data Science Group

Versions

Version
1.0 Jul 6, 2017

age-predictor

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download age-predictor

How to add to project

Dependencies

Project Modules

Author Age Prediction

Usage

How to train an Age Classifier

How to evaluate an Age Classifier Model

How to run the Age Classifier

Downloads

Citation:

Contributors

License

USC Information Retrieval & Data Science

Versions