spark-plug

License	License The Apache Software License, Version 2.0
GroupId	GroupId com.bizo
ArtifactId	ArtifactId spark-plug_2.11
Last Version	Last Version 1.2.6
Release Date	Release Date Feb 10, 2016
Type	Type jar
Description	Description spark-plug spark-plug
Project URL	Project URL https://github.com/ogrodnek/spark-plug
Project Organization	Project Organization com.bizo
Source Code Management	Source Code Management https://github.com/ogrodnek/spark-plug

Download spark-plug_2.11

Filename	Size
spark-plug_2.11-1.2.6.pom
spark-plug_2.11-1.2.6.jar	123 KB
spark-plug_2.11-1.2.6-sources.jar	10 KB
spark-plug_2.11-1.2.6-javadoc.jar	503 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.bizo/spark-plug_2.11/ -->
<dependency>
    <groupId>com.bizo</groupId>
    <artifactId>spark-plug_2.11</artifactId>
    <version>1.2.6</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.bizo/spark-plug_2.11/
implementation 'com.bizo:spark-plug_2.11:1.2.6'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.bizo/spark-plug_2.11/
implementation ("com.bizo:spark-plug_2.11:1.2.6")

Apache Buildr

'com.bizo:spark-plug_2.11:jar:1.2.6'

Apache Ivy

<dependency org="com.bizo" name="spark-plug_2.11" rev="1.2.6">
  <artifact name="spark-plug_2.11" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.bizo', module='spark-plug_2.11', version='1.2.6')
)

Scala SBT

libraryDependencies += "com.bizo" % "spark-plug_2.11" % "1.2.6"

Leiningen

[com.bizo/spark-plug_2.11 "1.2.6"]

Dependencies

compile (4)

Group / Artifact	Type	Version
org.scala-lang : scala-library	jar	2.11.5
com.amazonaws : aws-java-sdk	jar	1.10.16
com.googlecode.json-simple : json-simple	jar	1.1.1
commons-lang : commons-lang	jar	2.6

test (2)

Group / Artifact	Type	Version
junit : junit	jar	4.10
com.novocode : junit-interface	jar	0.10-M4

Project Modules

There are no modules declared in this project.

spark-plug

A scala driver for launching Amazon EMR jobs

why?

We run a lot of reports. In the past, these have been kicked off by bash scripts that typically do things like date math, copy scripts and config files to s3 before calling to the amazon elastic-mapreduce command line client to launch the job. The emr client invocation ends up being dozen of lines of bash code adding each step and passing arguments.

It's been a pain to share defaults or add any abstraction over common job steps. Additionally, performing date arithmetic and conditionally adding EMR steps can be a pain. Lastly, the EMR client offers less control over certain options available from the EMR API.

simple example

val flow = JobFlow(
  name      = s"${stage}: analytics report [${date}]",
  cluster   = Master() + Core(8) + Spot(10),
  bootstrap = Seq(MemoryIntensive),
  steps     = Seq(
    SetupDebugging(),
    new HiveStep("s3://bucket/location/report.sql",
      Map("YEAR" -> year, "MONTH" -> month, "DAY" -> day))
  )
)

val id = Emr.run(flow)(ClusterDefaults(hadoop="1.0.3"))
println(id)

API documentation

download

Available in Maven Central as com.bizo spark-plug_2.10

Versions

Version
1.2.6 Feb 10, 2016

spark-plug

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Project Organization

Source Code Management

Download spark-plug_2.11

How to add to project

Dependencies

compile (4)

test (2)

Project Modules

spark-plug

why?

simple example

download

Versions