Infinispan Hadoop Core

InputFormat/OutputFormat to integrate with Hadoop Map Reduce and supported tools

License

License

Categories

Categories

Infinispan Data Caching
GroupId

GroupId

org.infinispan.hadoop
ArtifactId

ArtifactId

infinispan-hadoop-core
Last Version

Last Version

0.4
Release Date

Release Date

Type

Type

jar
Description

Description

Infinispan Hadoop Core
InputFormat/OutputFormat to integrate with Hadoop Map Reduce and supported tools
Project Organization

Project Organization

JBoss, a division of Red Hat

Download infinispan-hadoop-core

How to add to project

<!-- https://jarcasting.com/artifacts/org.infinispan.hadoop/infinispan-hadoop-core/ -->
<dependency>
    <groupId>org.infinispan.hadoop</groupId>
    <artifactId>infinispan-hadoop-core</artifactId>
    <version>0.4</version>
</dependency>
// https://jarcasting.com/artifacts/org.infinispan.hadoop/infinispan-hadoop-core/
implementation 'org.infinispan.hadoop:infinispan-hadoop-core:0.4'
// https://jarcasting.com/artifacts/org.infinispan.hadoop/infinispan-hadoop-core/
implementation ("org.infinispan.hadoop:infinispan-hadoop-core:0.4")
'org.infinispan.hadoop:infinispan-hadoop-core:jar:0.4'
<dependency org="org.infinispan.hadoop" name="infinispan-hadoop-core" rev="0.4">
  <artifact name="infinispan-hadoop-core" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.infinispan.hadoop', module='infinispan-hadoop-core', version='0.4')
)
libraryDependencies += "org.infinispan.hadoop" % "infinispan-hadoop-core" % "0.4"
[org.infinispan.hadoop/infinispan-hadoop-core "0.4"]

Dependencies

compile (1)

Group / Artifact Type Version
org.infinispan : infinispan-client-hotrod jar 9.4.6.Final

provided (3)

Group / Artifact Type Version
org.apache.hadoop : hadoop-mapreduce-client-common jar 3.1.1
org.apache.hadoop : hadoop-mapreduce-client-core jar 3.1.1
org.apache.hadoop : hadoop-common jar 3.1.1

test (8)

Group / Artifact Type Version
junit : junit jar 4.12
org.apache.hadoop : hadoop-yarn-server-tests jar 3.1.1
org.apache.hadoop : hadoop-minicluster jar 3.1.1
org.infinispan : infinispan-core jar 9.4.6.Final
org.jboss.arquillian.junit : arquillian-junit-container jar 1.1.13.Final
org.wildfly.arquillian : wildfly-arquillian-container-managed jar 1.0.2.Final
org.infinispan.arquillian.container : infinispan-arquillian-impl jar 1.2.0.Alpha3
org.jacoco : org.jacoco.agent jar 0.7.5.201505241946

Project Modules

There are no modules declared in this project.

Infinispan Hadoop Build Status

Integrations with Apache Hadoop and related frameworks.

Compatibility

Version Infinispan Hadoop Java
0.1 8.0.x 2.x 8
0.2 8.2.x 2.x 8
0.3 9.4.x 2.x 3.x 8
0.4 9.4.x 2.x 3.x 8

InfinispanInputFormat and InfinispanOutputFormat

Implementation of Hadoop InputFormat and OutputFormat that allows reading and writing data to Infinispan Server with best data locality. Partitions are generated based on segment ownership and allows processing of data in a cache using multiple splits in parallel.

Maven Coordinates

 <dependency>  
    <groupId>org.infinispan.hadoop</groupId>  
    <artifactId>infinispan-hadoop-core</artifactId>  
    <version>0.4</version>
 </dependency>  

Sample usage with Hadoop YARN mapreduce application:

import org.infinispan.hadoop.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;

Configuration configuration = new Configuration();
String hosts = "172.17.0.2:11222;172.17.0.3:11222";

// Configures input/output caches
configuration.set(InfinispanConfiguration.INPUT_REMOTE_CACHE_SERVER_LIST, hosts);
configuration.set(InfinispanConfiguration.OUTPUT_REMOTE_CACHE_SERVER_LIST, hosts);

configuration.set(InfinispanConfiguration.INPUT_REMOTE_CACHE_NAME, "map-reduce-in");
configuration.set(InfinispanConfiguration.OUTPUT_REMOTE_CACHE_NAME, "map-reduce-out");

Job job = Job.getInstance(configuration, "Infinispan job");

// Map and Reduce implementation
job.setMapperClass(MapClass.class);
job.setReducerClass(ReduceClass.class);

job.setInputFormatClass(InfinispanInputFormat.class);
job.setOutputFormatClass(InfinispanOutputFormat.class);

Supported Configurations:

Name Description Default
hadoop.ispn.input.filter.factory The name of the filter factory deployed on the server to pre-filter data before reading null (no filtering)
hadoop.ispn.input.cache.name The name of cache where data will be read from "default"
hadoop.ispn.input.read.batch Batch size when reading from the cache 5000
hadoop.ispn.output.write.batch Batch size when writing to the cache 500
hadoop.ispn.input.remote.cache.servers List of servers of the input cache, in the format host1:port1;host2:port2 localhost:11222
hadoop.ispn.output.cache.name The name of cache where job results will be written to "default"
hadoop.ispn.output.remote.cache.servers List of servers of the output cache, in the format host1:port1;host2:port2
hadoop.ispn.input.converter Class name with an implementation of org.infinispan.hadoop.KeyValueConverter, applied after reading from the cache null (no converting)
hadoop.ispn.output.converter Class name with an implementation of org.infinispan.hadoop.KeyValueConverter, applied before writing null (no converting)

Demos

Refer to https://github.com/infinispan/infinispan-hadoop/tree/master/samples/

Releasing

The $MAVEN_HOME/conf/settings.xml must contain credentials for the release repository. Add the following section in <servers>:

<server>
   <id>jboss-snapshots-repository</id>
   <username>RELEASE_USER</username>
   <password>RELEASE_PASS</password>
</server>
<server>
   <id>jboss-releases-repository</id>
   <username>RELEASE_USER</username>
   <password>RELEASE_PASS</password>
</server>

To release:

mvn release:prepare release:perform -B
org.infinispan.hadoop

Infinispan

Infinispan is a distributed in-memory key/value data store with optional schema, available under the Apache License 2.0.

Versions

Version
0.4
0.3
0.2
0.1