JPMML-Evaluator-Spark

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

License

License

Categories

Categories

JPMML Business Logic Libraries Machine Learning
GroupId

GroupId

org.jpmml
ArtifactId

ArtifactId

pmml-evaluator-spark
Last Version

Last Version

1.0.0
Release Date

Release Date

Type

Type

jar
Description

Description

JPMML-Evaluator-Spark
PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
Project URL

Project URL

http://www.jpmml.org

Download pmml-evaluator-spark

How to add to project

<!-- https://jarcasting.com/artifacts/org.jpmml/pmml-evaluator-spark/ -->
<dependency>
    <groupId>org.jpmml</groupId>
    <artifactId>pmml-evaluator-spark</artifactId>
    <version>1.0.0</version>
</dependency>
// https://jarcasting.com/artifacts/org.jpmml/pmml-evaluator-spark/
implementation 'org.jpmml:pmml-evaluator-spark:1.0.0'
// https://jarcasting.com/artifacts/org.jpmml/pmml-evaluator-spark/
implementation ("org.jpmml:pmml-evaluator-spark:1.0.0")
'org.jpmml:pmml-evaluator-spark:jar:1.0.0'
<dependency org="org.jpmml" name="pmml-evaluator-spark" rev="1.0.0">
  <artifact name="pmml-evaluator-spark" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.jpmml', module='pmml-evaluator-spark', version='1.0.0')
)
libraryDependencies += "org.jpmml" % "pmml-evaluator-spark" % "1.0.0"
[org.jpmml/pmml-evaluator-spark "1.0.0"]

Dependencies

compile (1)

Group / Artifact Type Version
org.jpmml : pmml-evaluator jar 1.3.6

provided (4)

Group / Artifact Type Version
org.apache.spark : spark-catalyst_2.10 jar [1.5.0, 1.6.3]
org.apache.spark : spark-core_2.10 jar [1.5.0, 1.6.3]
org.apache.spark : spark-mllib_2.10 jar [1.5.0, 1.6.3]
org.apache.spark : spark-sql_2.10 jar [1.5.0, 1.6.3]

Project Modules

There are no modules declared in this project.

JPMML-Evaluator-Spark Build Status

PMML evaluator library for the Apache Spark cluster computing system (https://spark.apache.org/).

Features

  • Full support for PMML specification versions 3.0 through 4.3. The evaluation is handled by the JPMML-Evaluator library.

Prerequisites

  • Apache Spark version 2.0.X, 2.1.X, 2.2.X, 2.3.X or 2.4.X.

Installation

The JPMML-Evaluator-Spark library JAR file (together with accompanying Java source and Javadocs JAR files) is released via Maven Central Repository.

The current version is 1.2.2 (16 January, 2019).

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>jpmml-evaluator-spark</artifactId>
	<version>1.2.2</version>
</dependency>

A note about building and packaging JPMML-Evaluator-Spark applications. The JPMML-Evaluator library depends on JPMML-Model and Google Guava library versions that are in conflict with the ones that are bundled with Apache Spark and/or Apache Hadoop. This conflict can be easily solved by relocating JPMML-Evaluator library dependencies to a different namespace using the Apache Maven Shade Plugin.

Usage

Building a generic transformer based on a PMML byte stream:

InputStream pmmlIs = ...;

EvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
	.setLocatable(false)
	.setVisitors(new DefaultVisitorBattery())
	.load(pmmlIs);

Evaluator evaluator = evaluatorBuilder.build();

// Performing a self-check (duplicates as a warm-up)
evaluator.verify();

TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
	.withTargetCols()
	.withOutputCols()
	.exploded(false);

Transformer pmmlTransformer = pmmlTransformerBuilder.build();

Building an Apache Spark ML-style regressor when the PMML document is known to contain a regression model (eg. auto-mpg dataset):

TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
	.withLabelCol("MPG") // Double column
	.exploded(true);

Building an Apache Spark ML-style classifier when the PMML document is known to contain a classification model (eg. iris-species dataset):

TransformerBuilder pmmlTransformerBuilder = new TransformerBuilder(evaluator)
	.withLabelCol("Species") // String column
	.withProbabilityCol("Species_probability", Arrays.asList("setosa", "versicolor", "virginica")) // Vector column
	.exploded(true);

Scoring data:

Dataset<?> inputDs = ...;

Dataset<?> resultDs = pmmlTransformer.transform(inputDs);

In default mode, the transformation appends an intermediary "pmml" column to the data frame, which contains all the requested result columns:

root
 |-- Sepal_Length: double (nullable = true)
 |-- Sepal_Width: double (nullable = true)
 |-- Petal_Length: double (nullable = true)
 |-- Petal_Width: double (nullable = true)
 |-- pmml: struct (nullable = true)
 |    |-- Species: string (nullable = false)
 |    |-- Species_probability: vector (nullable = false)

In exploded mode, the transformation appends all the requested result columns to the data frame:

root
 |-- Sepal_Length: double (nullable = true)
 |-- Sepal_Width: double (nullable = true)
 |-- Petal_Length: double (nullable = true)
 |-- Petal_Width: double (nullable = true)
 |-- Species: string (nullable = false)
 |-- Species_probability: vector (nullable = false)

License

JPMML-Evaluator-Spark is dual-licensed under the GNU Affero General Public License (AGPL) version 3.0, and a commercial license.

Additional information

JPMML-Evaluator-Spark is developed and maintained by Openscoring Ltd, Estonia.

Interested in using JPMML software in your application? Please contact [email protected]

org.jpmml

Java PMML API

Java libraries for producing and consuming PMML documents

Versions

Version
1.0.0