spark-records


License

License

GroupId

GroupId

com.swoop
ArtifactId

ArtifactId

spark-records_2.12
Last Version

Last Version

3.0.1
Release Date

Release Date

Type

Type

jar
Description

Description

spark-records
spark-records
Project URL

Project URL

https://github.com/swoop-inc/spark-records
Project Organization

Project Organization

com.swoop
Source Code Management

Source Code Management

https://github.com/swoop-inc/spark-records

Download spark-records_2.12

How to add to project

<!-- https://jarcasting.com/artifacts/com.swoop/spark-records_2.12/ -->
<dependency>
    <groupId>com.swoop</groupId>
    <artifactId>spark-records_2.12</artifactId>
    <version>3.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.swoop/spark-records_2.12/
implementation 'com.swoop:spark-records_2.12:3.0.1'
// https://jarcasting.com/artifacts/com.swoop/spark-records_2.12/
implementation ("com.swoop:spark-records_2.12:3.0.1")
'com.swoop:spark-records_2.12:jar:3.0.1'
<dependency org="com.swoop" name="spark-records_2.12" rev="3.0.1">
  <artifact name="spark-records_2.12" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.swoop', module='spark-records_2.12', version='3.0.1')
)
libraryDependencies += "com.swoop" % "spark-records_2.12" % "3.0.1"
[com.swoop/spark-records_2.12 "3.0.1"]

Dependencies

compile (1)

Group / Artifact Type Version
org.scala-lang : scala-library jar 2.12.12

provided (8)

Group / Artifact Type Version
org.apache.spark : spark-core_2.12 jar 3.0.1
org.apache.spark : spark-core_2.12 jar 3.0.1
org.apache.spark : spark-sql_2.12 jar 3.0.1
org.apache.spark : spark-sql_2.12 jar 3.0.1
org.apache.logging.log4j : log4j-core jar 2.7
org.apache.logging.log4j : log4j-core jar 2.7
org.apache.logging.log4j : log4j-api jar 2.7
org.apache.logging.log4j : log4j-api jar 2.7

test (2)

Group / Artifact Type Version
org.scalatest : scalatest_2.12 jar 3.0.4
org.scalatest : scalatest_2.12 jar 3.0.4

Project Modules

There are no modules declared in this project.

Spark Records

Spark Records is a data processing pattern with an associated lightweight, dependency-free framework for Apache Spark v2+ that enables:

  1. Bulletproof data processing with Spark
    Your jobs will never unpredictably fail midway due to data transformation bugs. Spark records give you predictable failure control through instant data quality checks performed on metrics automatically collected during job execution, without any additional querying.

  2. Automatic row-level structured logging
    Exceptions generated during job execution are automatically associated with the data that caused the exception, down to nested exception causes and full stack traces. If you need to reprocess data, you can trivially and efficiently choose to only process the failed inputs.

  3. Lightning-fast root cause analysis
    Get answers to any questions related to exceptions or warnings generated during job execution directly using SparkSQL or your favorite Spark DSL. Would you like to see the top 5 issues encountered during job execution with example source data and the line in your code that caused the problem? You can.

Spark Records has been tested with petabyte-scale data at Swoop. The library was extracted out of Swoop's production systems to share with the Spark community.

See the documentation for more information or watch the Spark Summit talk (slides).

Installation

Just add the following to your libraryDependencies in SBT:

resolvers += Resolver.bintrayRepo("swoop-inc", "maven")

libraryDependencies += "com.swoop" %% "spark-records" % "<version>"

You can find all released versions here.

Community

Contributions and feedback of any kind are welcome.

Spark Records is maintained by Sim Simeonov and the team at Swoop.

Special thanks to Reynold Xin and Michael Armbrust for many interesting conversations about better ways to use Spark.

Development

Build docs microsite

sbt "project docs" makeMicrosite

Run docs microsite locally (run under target/site folder)

jekyll serve -b /spark-records

More details

License

spark-records is Copyright © 2017 Simeon Simeonov and Swoop, Inc. It is free software, and may be redistributed under the terms of the LICENSE.

com.swoop

Swoop

Versions

Version
3.0.1