com.groupon.dse:baryon

A library for building Spark streaming applications that consume data from Kafka.

License	License The BSD 3-Clause License
GroupId	GroupId com.groupon.dse
ArtifactId	ArtifactId baryon
Last Version	Last Version 1.0
Release Date	Release Date Jul 5, 2016
Type	Type jar
Description	Description com.groupon.dse:baryon A library for building Spark streaming applications that consume data from Kafka.
Project URL	Project URL https://github.com/groupon/baryon
Source Code Management	Source Code Management https://github.com/groupon/baryon

Download baryon

Filename	Size
baryon-1.0.pom
baryon-1.0.jar	248 KB
baryon-1.0-tests.jar	417 KB
baryon-1.0-test-sources.jar	55 KB
baryon-1.0-sources.jar	77 KB
baryon-1.0-javadoc.jar	610 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.groupon.dse/baryon/ -->
<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>baryon</artifactId>
    <version>1.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.groupon.dse/baryon/
implementation 'com.groupon.dse:baryon:1.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.groupon.dse/baryon/
implementation ("com.groupon.dse:baryon:1.0")

Apache Buildr

'com.groupon.dse:baryon:jar:1.0'

Apache Ivy

<dependency org="com.groupon.dse" name="baryon" rev="1.0">
  <artifact name="baryon" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.groupon.dse', module='baryon', version='1.0')
)

Scala SBT

libraryDependencies += "com.groupon.dse" % "baryon" % "1.0"

Leiningen

[com.groupon.dse/baryon "1.0"]

Dependencies

compile (14)

Group / Artifact	Type	Version
org.apache.kafka : kafka_2.10	jar	0.8.1.1
com.101tec : zkclient	jar	0.7
com.groupon.dse : spark-metrics	jar	1.0
org.apache.zookeeper : zookeeper	jar	3.4.6
org.json4s : json4s-core_2.10	jar	3.2.10
org.json4s : json4s-jackson_2.10	jar	3.2.10
org.scala-lang : scala-library	jar	2.10.4
org.slf4j : slf4j-api	jar	1.7.10
com.typesafe.play : play-ws_2.10	jar	2.3.10
com.typesafe.play : play-json_2.10	jar	2.3.10
com.fasterxml.jackson.core : jackson-databind	jar	2.4.4
com.fasterxml.jackson.module : jackson-module-scala_2.10	jar	2.4.4
com.fasterxml.jackson.core : jackson-core	jar	2.4.4
com.ning : async-http-client	jar	1.9.21

provided (4)

Group / Artifact	Type	Version
org.apache.spark : spark-core_2.10	jar	1.5.2
org.apache.spark : spark-streaming_2.10	jar	1.5.2
log4j : log4j	jar	1.2.17
org.apache.hadoop : hadoop-common	jar	2.2.0

test (2)

Group / Artifact	Type	Version
org.mockito : mockito-all	jar	1.10.8
org.scalatest : scalatest_2.10	jar	2.2.4

Project Modules

There are no modules declared in this project.

Baryon

Baryon is a library for building Spark streaming applications that consume data from Kafka.

Baryon abstracts away all the bookkeeping involved in reliably connecting to a Kafka cluster and fetching data from it, so that users only need to focus on the logic to process this data.

For a detailed guide on getting started with Baryon, take a look at the wiki.

Why Baryon?

Spark itself also has libraries for interacting with Kafka, as documented in its Kafka integration guide. These libraries are well-developed, but there are certain limitations there that Baryon intends to address:

Code-independent checkpointing

Baryon's Kafka state management system allows Kafka consumption state to be stored across multiple runs of an application, even when there are code changes. Spark's checkpointing system does not support maintaining state across changes in code, so users of Spark's Kafka libraries must implement the offset management logic themselves.
Improved error handling

Baryon handles errors related to Kafka much more thoroughly than Spark's Kafka libraries, so users don't need to worry about handling Kafka problems in their code.

In addition to the above, there are a handful of additional features unique to Baryon:

Multiple consumption modes

Baryon has two modes of consumption, the blocking mode and the non-blocking mode, which can be changed without any code changes. The blocking mode more or less corresponds to the consumption behavior of the "direct" approach, while the non-blocking mode has consumption behavior similar to the receiver-based approach.
Dynamically configured topics

Baryon supports changes to the set of Kafka topics that are consumed while the application is running. Alongside this, configurations can be set at a per-topic level, which makes it easier to build a single application to process multiple, heterogeneous data streams.
Aggregated metrics

Baryon uses the spark-metrics library to collect and aggregate useful metrics across the driver and executors. These include metrics like offset lag, throughput, error rates, as well as augmented versions of existing metrics that Spark provides. The metrics here are integrated with Spark's metrics system, so they are compatible with the reporting system that comes with Spark.

Quick Start

Add Baryon as a dependency:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>baryon</artifactId>
    <version>1.0</version>
</dependency>

If you want to add custom metrics that are integrated with Spark, use the spark-metrics that Baryon also uses:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>spark-metrics</artifactId>
    <version>1.0</version>
</dependency>

Take a look at the examples to see how to write the driver and a ReceiverPlugin.

Groupon

Versions

Version
1.0 Jul 5, 2016

com.groupon.dse:baryon

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management