spark-deployer-core

License	License Apache-2.0
Categories	Categories Net
GroupId	GroupId net.pishen
ArtifactId	ArtifactId spark-deployer-core_2.10
Last Version	Last Version 3.0.2
Release Date	Release Date Sep 13, 2016
Type	Type jar
Description	Description spark-deployer-core spark-deployer-core
Project URL	Project URL https://github.com/pishen/spark-deployer
Project Organization	Project Organization net.pishen
Source Code Management	Source Code Management https://github.com/pishen/spark-deployer.git

Download spark-deployer-core_2.10

Filename	Size
spark-deployer-core_2.10-3.0.2.pom
spark-deployer-core_2.10-3.0.2.jar	110 KB
spark-deployer-core_2.10-3.0.2-sources.jar	6 KB
spark-deployer-core_2.10-3.0.2-javadoc.jar	358 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/net.pishen/spark-deployer-core_2.10/ -->
<dependency>
    <groupId>net.pishen</groupId>
    <artifactId>spark-deployer-core_2.10</artifactId>
    <version>3.0.2</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/net.pishen/spark-deployer-core_2.10/
implementation 'net.pishen:spark-deployer-core_2.10:3.0.2'

Gradle Kotlin

// https://jarcasting.com/artifacts/net.pishen/spark-deployer-core_2.10/
implementation ("net.pishen:spark-deployer-core_2.10:3.0.2")

Apache Buildr

'net.pishen:spark-deployer-core_2.10:jar:3.0.2'

Apache Ivy

<dependency org="net.pishen" name="spark-deployer-core_2.10" rev="3.0.2">
  <artifact name="spark-deployer-core_2.10" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='net.pishen', module='spark-deployer-core_2.10', version='3.0.2')
)

Scala SBT

libraryDependencies += "net.pishen" % "spark-deployer-core_2.10" % "3.0.2"

Leiningen

[net.pishen/spark-deployer-core_2.10 "3.0.2"]

Dependencies

compile (7)

Group / Artifact	Type	Version
org.scala-lang : scala-library	jar	2.10.6
com.github.pathikrit : better-files_2.10	jar	2.14.0
com.typesafe.play : play-json_2.10	jar	2.4.8
com.amazonaws : aws-java-sdk-ec2	jar	1.11.23
org.scalaj : scalaj-http_2.10	jar	2.3.0
org.slf4s : slf4s-api_2.10	jar	1.7.12
com.typesafe : config	jar	1.3.0

Project Modules

There are no modules declared in this project.

spark-deployer

A Scala tool which helps deploying Apache Spark stand-alone cluster on EC2 and submitting job.
Currently supports Spark 2.0.0+.
There are two modes when using spark-deployer: SBT plugin mode and embedded mode.

SBT plugin mode

Here are the basic steps to run a Spark job (all the sbt commands support TAB-completion):

Set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
Prepare a project with structure like below:

project-root
├── build.sbt
├── project
│   └── plugins.sbt
└── src
    └── main
        └── scala
            └── mypackage
                └── Main.scala

Add one line in project/plugins.sbt:

addSbtPlugin("net.pishen" % "spark-deployer-sbt" % "3.0.2")

Write your Spark project's build.sbt (Here we give a simple example):

name := "my-project-name"
 
scalaVersion := "2.11.8"
 
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.0" % "provided"
)

Write your job's algorithm in src/main/scala/mypackage/Main.scala:

package mypackage
 
import org.apache.spark._
 
object Main {
  def main(args: Array[String]) {
    //setup spark
    val sc = new SparkContext(new SparkConf())
    //your algorithm
    val n = 10000000
    val count = sc.parallelize(1 to n).map { i =>
      val x = scala.math.random
      val y = scala.math.random
      if (x * x + y * y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
  }
}

Enter sbt, and build a config by:

> sparkBuildConfig

(Most settings have default values, just hit Enter to go through it.)

Create a cluster with 1 master and 2 workers by:

> sparkCreateCluster 2

See your cluster's status by:

> sparkShowMachines

Submit your job by:

> sparkSubmit

When your job is done, destroy your cluster with

> sparkDestroyCluster

Advanced functions

To build config with different name or build a config based on old one:
```
> sparkBuildConfig <new-config-name>
> sparkBuildConfig <new-config-name> from <old-config-name>
```
All the configs are stored as .deployer.json files in the conf/ folder. You can modify it if you know what you're doing.
To change the current config:
```
> sparkChangeConfig <config-name>
```

To submit a job with arguments or with a main class:

> sparkSubmit <args>
> sparkSubmitMain mypackage.Main <args>

To add or remove worker machines dynamically:

> sparkAddWorkers <num-of-workers>
> sparkRemoveWorkers <num-of-workers>

Embedded mode

If you don't want to use sbt, or if you would like to trigger the cluster creation from within your Scala application, you can include the library of spark-deployer directly:

libraryDependencies += "net.pishen" %% "spark-deployer-core" % "3.0.2"

Then, from your Scala code, you can do something like this:

import sparkdeployer._

// build a ClusterConf
val clusterConf = ClusterConf.build()

// save and load ClusterConf
clusterConf.save("path/to/conf.deployer.json")
val clusterConfReloaded = ClusterConf.load("path/to/conf.deployer.json")

// create cluster and submit job
val sparkDeployer = new SparkDeployer()(clusterConf)

val workers = 2
sparkDeployer.createCluster(workers)

val jar = new File("path/to/job.jar")
val mainClass = "mypackage.Main"
val args = Seq("arg0", "arg1")
sparkDeployer.submit(jar, mainClass, args)

sparkDeployer.destroyCluster()

Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY should also be set.
You may prepare the job.jar by sbt-assembly from other sbt project with Spark.
For other available functions, check SparkDeployer.scala in our source code.

spark-deployer uses slf4j, remember to add your own backend to see the log. For example, to print the log on screen, add

libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.7.14"

FAQ

Could I use other ami?

Yes, just specify the ami id when running sparkBuildConfig. The image should be HVM EBS-Backed with Java 7+ installed. You can also run some commands before Spark start on each machine by editing the preStartCommands in json config. For example:

"preStartCommands": [
  "sudo bash -c \"echo -e 'LC_ALL=en_US.UTF-8\\nLANG=en_US.UTF-8' >> /etc/environment\"",
  "sudo apt-get -qq install openjdk-8-jre",
  "cd spark/conf/ && cp log4j.properties.template log4j.properties && echo 'log4j.rootCategory=WARN, console' >> log4j.properties"
]

When using custom ami, the root device should be your root volume's name (/dev/sda1 for Ubuntu) that can be enlarged by disk size settings in master and workers.

Could I use custom Spark tarball?

Yes, just change the tgz url when running sparkBuildConfig, the tgz will be extracted as a spark/ folder in each machine's home folder.

What rules should I set on my security group?

Assuming your security group id is sg-abcde123, the basic settings is:

Type	Protocol	Port Range	Source
All traffic	All	All	`sg-abcde123`
SSH	TCP	22	`<your-allowed-ip>`
Custom TCP Rule	TCP	8080-8081	`<your-allowed-ip>`
Custom TCP Rule	TCP	4040	`<your-allowed-ip>`

How do I upgrade the config to new version of spark-deployer?

Change to the config you want to upgrade, and run sparkUpgradeConfig to build a new config based on settings in old one. If this doesn't work or you don't mind rebuilding one from scratch, it's recommended to directly create a new config by sparkBuildConfig.

Could I change the directory where configurations are saved?

You can change it by add the following line to your build.sbt:

sparkConfigDir := "path/to/my-config-dir"

How to contribute

Please report issue or ask on gitter if you meet any problem.
Pull requests are welcome.

Versions

Version
3.0.2 Sep 13, 2016
3.0.1 Sep 9, 2016
3.0.0 Sep 8, 2016
2.8.2 May 18, 2016
2.8.1 May 10, 2016
2.8.0 May 5, 2016
2.7.1 May 4, 2016
2.7.0 May 3, 2016
2.6.0 Apr 27, 2016
2.5.0 Apr 25, 2016
2.4.0 Apr 22, 2016
2.3.0 Apr 21, 2016
2.2.0 Apr 21, 2016
2.1.0 Apr 15, 2016
2.0.1 Apr 14, 2016
2.0.0 Apr 5, 2016
1.3.0 Mar 10, 2016
1.2.0 Mar 1, 2016
1.1.1 Feb 28, 2016
1.1.0 Feb 2, 2016
1.0.1 Jan 27, 2016
1.0.0 Jan 26, 2016
0.13.0 Jan 11, 2016
0.12.0 Dec 21, 2015
0.11.1 Nov 28, 2015
0.11.0 Nov 24, 2015
0.10.1 Nov 20, 2015
0.10.0 Nov 19, 2015
0.9.2 Oct 25, 2015
0.9.1 Oct 4, 2015
0.9.0 Oct 2, 2015
0.8.0 Sep 25, 2015
0.7.4 Aug 13, 2015
0.7.3 Aug 5, 2015
0.7.2 Aug 1, 2015
0.7.1 Jul 30, 2015
0.6.1 Jul 29, 2015
0.6.0 Jul 26, 2015
0.5.2 Jul 24, 2015
0.5.1 Jun 23, 2015
0.5.0 Jun 5, 2015

spark-deployer-core

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Project Organization

Source Code Management