Sparkplug Common Code

A Scala library that exposes Spark via message queues.

License	License The Apache License, Version 2.0
GroupId	GroupId software.uncharted.sparkplug
ArtifactId	ArtifactId sparkplug-common
Last Version	Last Version 0.2.1
Release Date	Release Date Apr 25, 2016
Type	Type jar
Description	Description Sparkplug Common Code A Scala library that exposes Spark via message queues.
Project URL	Project URL https://github.com/unchartedsoftware/sparkplug
Source Code Management	Source Code Management https://github.com/unchartedsoftware/sparkplug

Download sparkplug-common

Filename	Size
sparkplug-common-0.2.1.pom
sparkplug-common-0.2.1.jar	11 MB
sparkplug-common-0.2.1-tests.jar	261 bytes
sparkplug-common-0.2.1-sources.jar	2 KB
sparkplug-common-0.2.1-javadoc.jar	354 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/software.uncharted.sparkplug/sparkplug-common/ -->
<dependency>
    <groupId>software.uncharted.sparkplug</groupId>
    <artifactId>sparkplug-common</artifactId>
    <version>0.2.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/software.uncharted.sparkplug/sparkplug-common/
implementation 'software.uncharted.sparkplug:sparkplug-common:0.2.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/software.uncharted.sparkplug/sparkplug-common/
implementation ("software.uncharted.sparkplug:sparkplug-common:0.2.1")

Apache Buildr

'software.uncharted.sparkplug:sparkplug-common:jar:0.2.1'

Apache Ivy

<dependency org="software.uncharted.sparkplug" name="sparkplug-common" rev="0.2.1">
  <artifact name="sparkplug-common" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='software.uncharted.sparkplug', module='sparkplug-common', version='0.2.1')
)

Scala SBT

libraryDependencies += "software.uncharted.sparkplug" % "sparkplug-common" % "0.2.1"

Leiningen

[software.uncharted.sparkplug/sparkplug-common "0.2.1"]

Dependencies

compile (3)

Group / Artifact	Type	Version
com.typesafe.akka : akka-stream-experimental_2.10	jar	2.0.4
io.scalac : reactive-rabbit_2.10	jar	1.0.3
com.typesafe : config	jar	1.3.0

test (2)

Group / Artifact	Type	Version
org.scalatest : scalatest_2.10	jar	2.2.6
org.pegdown : pegdown	jar	1.6.0

Project Modules

There are no modules declared in this project.

sparkplug

about sparkplug

sparkplug allows Apache Spark to act as a near real-time/first-class citizen in an enterprise architecture by exposing Spark via RabbitMQ.

Simply put, you write PlugHandler classes that are triggered based on commands, allowing multiple tasks to be embedded in one library. The PlugClient allows for an easy way to submit jobs to the server. One of the easiest ways to get data between the two systems is ot have the Spark job output to a shared cache, which the business logic can then access once the PlugClient response is handled.

using sparkplug

To achieve this goal, one creates a custom library that includes sparkplug-server, then:

Implement PlugHandler as many times as you want; each PlugHandler registers to listen for a specific command (a string tag to allow different messages to have different actions)

Get a handle to the Plug object and register the various PlugHandler implementations

val plug = Plug.getInstance()
plug.connect()
plug.registerHandler("test", plugHandler)

Tell the Plug to start consuming messages!
```
plug.run()
```
When you are done with the Plug, clean up nicely via shutdown()

Compile the JAR, use spark-submit to load it, and you will now be able to have Spark fire off jobs based on messages from a queue.

customizing sparkplug

sparkplug uses the Typesafe configuration library, and comes with an embedded reference.conf with the following defaults:

sparkplug {
  master = "spark://localhost:7077"

  inbound-queue = "q_sparkplug"
  outbound-queue = "r_sparkplug"
}

amqp {
  addresses = [
    { host = "localhost", port = 5672 }
  ],
  virtual-host = "/",
  username = "guest",
  password = "guest",
  heartbeat = disable,
  timeout = 30000,
  automatic-recovery = true,
  recovery-interval = 5s
  ssl = disable
}

These settings are reasonable for development/testing, but are NOT viable in production; you really will want to turn SSL/TLS on, ensure heartbeats are working, and use a RabbitMQ cluster. Similarly, the sparkplug.master URL will probably need to be changed.

the sparkplug client

The flip-side of having a system that listens to queues is having some other system that is putting messages on the queue and consuming the responses from Spark when a job completes. You can do this manually, simply invert submit messages to the listener-queue and then listen for responses on the outbound-queue. However, if this sounds like effort, we've created the PlugClient for you!

Implement the PlugResponseHandler (note: unlike the server where there are multiple handlers per command, there is only one response handler - you can embed the logic to discern what to do internally)
Get a handle to the PlugClient object and register the PlugResponseHandler implementation
```
val plugClient = PlugClient.getInstance()
plugClient.setHandler(handler)
```
Tell the PlugClient to start consuming messages!
```
plugClient.connect()
```
When you are done with the PlugClient, clean up nicely via shutdown()

testing sparkplug

Due to the fact that this library requires Spark and RabbitMQ, full integration testing requires running two Docker containers - one with RabbitMQ and another with a mini Spark cluster along with Gradle for testing.

To instantiate the environment, simply run the following:

./build_environment.sh /bin/bash
./gradlew clean test

roadmap

Better error handling
Java-based sparkplug client
sparkplug broker that allows one to distribute jobs across multiple clusters

Uncharted Software

Versions

Version
0.2.1 Apr 25, 2016

Sparkplug Common Code

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management