com.groupon.dse:baryon

A library for building Spark streaming applications that consume data from Kafka.

License

License

GroupId

GroupId

com.groupon.dse
ArtifactId

ArtifactId

baryon
Last Version

Last Version

1.0
Release Date

Release Date

Type

Type

jar
Description

Description

com.groupon.dse:baryon
A library for building Spark streaming applications that consume data from Kafka.
Project URL

Project URL

https://github.com/groupon/baryon
Source Code Management

Source Code Management

https://github.com/groupon/baryon

Download baryon

How to add to project

<!-- https://jarcasting.com/artifacts/com.groupon.dse/baryon/ -->
<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>baryon</artifactId>
    <version>1.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.groupon.dse/baryon/
implementation 'com.groupon.dse:baryon:1.0'
// https://jarcasting.com/artifacts/com.groupon.dse/baryon/
implementation ("com.groupon.dse:baryon:1.0")
'com.groupon.dse:baryon:jar:1.0'
<dependency org="com.groupon.dse" name="baryon" rev="1.0">
  <artifact name="baryon" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.groupon.dse', module='baryon', version='1.0')
)
libraryDependencies += "com.groupon.dse" % "baryon" % "1.0"
[com.groupon.dse/baryon "1.0"]

Dependencies

compile (14)

Group / Artifact Type Version
org.apache.kafka : kafka_2.10 jar 0.8.1.1
com.101tec : zkclient jar 0.7
com.groupon.dse : spark-metrics jar 1.0
org.apache.zookeeper : zookeeper jar 3.4.6
org.json4s : json4s-core_2.10 jar 3.2.10
org.json4s : json4s-jackson_2.10 jar 3.2.10
org.scala-lang : scala-library jar 2.10.4
org.slf4j : slf4j-api jar 1.7.10
com.typesafe.play : play-ws_2.10 jar 2.3.10
com.typesafe.play : play-json_2.10 jar 2.3.10
com.fasterxml.jackson.core : jackson-databind jar 2.4.4
com.fasterxml.jackson.module : jackson-module-scala_2.10 jar 2.4.4
com.fasterxml.jackson.core : jackson-core jar 2.4.4
com.ning : async-http-client jar 1.9.21

provided (4)

Group / Artifact Type Version
org.apache.spark : spark-core_2.10 jar 1.5.2
org.apache.spark : spark-streaming_2.10 jar 1.5.2
log4j : log4j jar 1.2.17
org.apache.hadoop : hadoop-common jar 2.2.0

test (2)

Group / Artifact Type Version
org.mockito : mockito-all jar 1.10.8
org.scalatest : scalatest_2.10 jar 2.2.4

Project Modules

There are no modules declared in this project.

Baryon

Baryon is a library for building Spark streaming applications that consume data from Kafka.

Baryon abstracts away all the bookkeeping involved in reliably connecting to a Kafka cluster and fetching data from it, so that users only need to focus on the logic to process this data.

For a detailed guide on getting started with Baryon, take a look at the wiki.

Why Baryon?

Spark itself also has libraries for interacting with Kafka, as documented in its Kafka integration guide. These libraries are well-developed, but there are certain limitations there that Baryon intends to address:

  • Code-independent checkpointing

    Baryon's Kafka state management system allows Kafka consumption state to be stored across multiple runs of an application, even when there are code changes. Spark's checkpointing system does not support maintaining state across changes in code, so users of Spark's Kafka libraries must implement the offset management logic themselves.

  • Improved error handling

    Baryon handles errors related to Kafka much more thoroughly than Spark's Kafka libraries, so users don't need to worry about handling Kafka problems in their code.

In addition to the above, there are a handful of additional features unique to Baryon:

  • Multiple consumption modes

    Baryon has two modes of consumption, the blocking mode and the non-blocking mode, which can be changed without any code changes. The blocking mode more or less corresponds to the consumption behavior of the "direct" approach, while the non-blocking mode has consumption behavior similar to the receiver-based approach.

  • Dynamically configured topics

    Baryon supports changes to the set of Kafka topics that are consumed while the application is running. Alongside this, configurations can be set at a per-topic level, which makes it easier to build a single application to process multiple, heterogeneous data streams.

  • Aggregated metrics

    Baryon uses the spark-metrics library to collect and aggregate useful metrics across the driver and executors. These include metrics like offset lag, throughput, error rates, as well as augmented versions of existing metrics that Spark provides. The metrics here are integrated with Spark's metrics system, so they are compatible with the reporting system that comes with Spark.

Quick Start

Add Baryon as a dependency:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>baryon</artifactId>
    <version>1.0</version>
</dependency>

If you want to add custom metrics that are integrated with Spark, use the spark-metrics that Baryon also uses:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>spark-metrics</artifactId>
    <version>1.0</version>
</dependency>

Take a look at the examples to see how to write the driver and a ReceiverPlugin.

com.groupon.dse

Groupon

Versions

Version
1.0