Lighthouse
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Principles
- Configuration as code
- Idempotent execution
- Utilities for easier building and testing Apache Spark based applications
Start using Lighthouse
In your build.sbt
, add this:
libraryDependencies += "be.dataminded" %% "lighthouse" % <version>
libraryDependencies += "be.dataminded" %% "lighthouse-testing" % <version> % Test
If you are using Maven, add this to your pom.xml
:
<dependency>
<groupId>be.dataminded</groupId>
<artifactId>lighthouse_2.11</artifactId>
<version>[version]</version>
</dependency>
<dependency>
<groupId>be.dataminded</groupId>
<artifactId>lighthouse-testing_2.11</artifactId>
<version>[version]</version>
<scope>test</scope>
</dependency>
Online Documentation
This README file only contains basic instructions. Here is a more complete tutorial: https://datamindedbe.github.io/lighthouse/tutorial/