Spark-Riak Connector
The Spark-Riak connector enables you to connect Spark applications to Riak KV and Riak TS with the Spark RDD and Spark DataFrames APIs. You can write your app in Scala, Python, and Java. The connector makes it easy to partition the data you get from Riak so multiple Spark workers can process the data in parallel and it has support for failover if a Riak node goes down while your Spark job is running.
Features
- Construct a Spark RDD from a Riak KV bucket with a set of keys
- Construct a Spark RDD from a Riak KV bucket by using a 2i string index or a set of indexes
- Construct a Spark RDD from a Riak KV bucket by using a 2i range query or a set of ranges
- Map JSON formatted data from Riak KV to user defined types
- Save a Spark RDD into a Riak KV bucket and apply 2i indexes to the contents
- Construct a Spark Dataframe from a Riak TS table using range queries and schema discovery
- Save a Spark Dataframe into a Riak TS table
- Construct a Spark RDD using Riak KV bucket's enhanced 2i query (a.k.a. full bucket read)
- Perform parallel full bucket reads from a Riak KV bucket into multiple partitions
Compatibility
- Riak TS 1.3.1+
- Apache Spark 1.6+
- Scala 2.10 and 2.11
- Java 8
Coming Soon
- Support for Riak KV 2.3 and later
Prerequisites
In order to use the Spark-Riak connector, you must have the following installed:
Spark-Riak Connector
-
Using the Spark-Riak Connector
- Configuration of Spark Context
- Failover Handling
- Reading Data From KV Bucket
- Writing Data To KV Bucket
- Writing Data To KV Bucket With 2i Indices
- Reading Data From TS Table
- Writing Data To TS Table
- Spark Dataframes With KV Bucket
- Spark Dataframes With TS Table
- Partitioning For KV Buckets
- Working With TS Dates
- Partitioning for Riak TS Table Queries
- TS Bulk Write
- Using Jupyter Notebook
- Spark Streaming
- Using Java With The Connector
Mailing List
The Riak Users Mailing List is highly trafficked and a great resource for technical discussions, Riak issues and questions, and community events and announcements.
We pride ourselves on answering every email that comes over the Riak User mailing list. Sign up and send away. If you prefer points for your questions, you can always tag Riak on StackOverflow.
IRC
The #riak IRC room on irc.freenode.net is a great place for real-time help with your Riak issues and questions.
Reporting Bugs
To report a bug or issue, please open a new issue against this repository.
You can read the full guidelines for bug reporting on the Riak Docs.
Contributing
Basho encourages contributions to the Spark-Riak Connector from the community. Here’s how to get started.
- Fork the appropriate project that is affected by your change.
- Make your changes and run the test suite.
- Commit your changes and push them to your fork.
- Open pull-requests for the appropriate projects.
- Basho engineers will review your pull-request, suggest changes, and merge it when it’s ready and/or offer feedback.
License
Copyright © 2016 Basho Technologies
Licensed under the Apache License, Version 2.0