pipeline

Pipeline is a application framework to orchestrate and stitch multiple apps to run sequentially. Framework provides sequential dependency,configuration sharing, restartability, sequence checking(order of execution) for applications

License	License The Apache License, Version 2.0
GroupId	GroupId io.github.bkosaraju
ArtifactId	ArtifactId pipeline_2.12
Last Version	Last Version 1.01
Release Date	Release Date Oct 5, 2020
Type	Type zip
Description	Description pipeline Pipeline is a application framework to orchestrate and stitch multiple apps to run sequentially. Framework provides sequential dependency,configuration sharing, restartability, sequence checking(order of execution) for applications
Project URL	Project URL https://bkosaraju.github.io/pipeline
Source Code Management	Source Code Management https://github.com/bkosaraju/pipeline

License

GroupId

io.github.bkosaraju

ArtifactId

pipeline_2.12

Last Version

1.01

Release Date

Oct 5, 2020

Type

zip

Description

pipeline

Project URL

https://bkosaraju.github.io/pipeline

Source Code Management

https://github.com/bkosaraju/pipeline

Filename	Size
pipeline_2.12-1.01.pom
pipeline_2.12-1.01.zip	60 MB
pipeline_2.12-1.01-sources.jar	45 KB
pipeline_2.12-1.01-javadoc.jar	1 MB
pipeline_2.12-1.01-all.jar	61 MB
Browse

Filename

Size

pipeline_2.12-1.01.pom

pipeline_2.12-1.01.zip

60 MB

pipeline_2.12-1.01-sources.jar

45 KB

pipeline_2.12-1.01-javadoc.jar

1 MB

pipeline_2.12-1.01-all.jar

61 MB

Browse

Group / Artifact	Type	Version
org.scala-lang : scala-library	jar	2.12.12
org.postgresql : postgresql	jar	42.2.6
org.jdbi : jdbi3-core	jar	3.10.1
org.jdbi : jdbi3-postgres	jar	3.10.1
org.jdbi : jdbi3-sqlobject	jar	3.10.1
commons-beanutils : commons-beanutils	jar	1.9.4
info.picocli : picocli	jar	3.9.5
io.github.bkosaraju : utils_2.12	jar	1.02
io.prometheus : simpleclient	jar	0.9.0
io.prometheus : simpleclient_pushgateway	jar	0.9.0

Group / Artifact

Type

Version

org.scala-lang : scala-library

jar

2.12.12

org.postgresql : postgresql

jar

42.2.6

org.jdbi : jdbi3-core

jar

3.10.1

org.jdbi : jdbi3-postgres

jar

3.10.1

org.jdbi : jdbi3-sqlobject

jar

3.10.1

commons-beanutils : commons-beanutils

jar

1.9.4

info.picocli : picocli

jar

3.9.5

io.github.bkosaraju : utils_2.12

jar

1.02

io.prometheus : simpleclient

jar

0.9.0

io.prometheus : simpleclient_pushgateway

jar

0.9.0

Project Modules

There are no modules declared in this project.

Pipeline

Pipeline is a application framework to orchestrate and stitch multiple apps to run sequentially.

Framework provides sequential dependency,configuration sharing, restartability, sequence checking(order of execution) for applications

pipeline rail multiple components which is tasks.

Task

Task is basic running block of code

running spark Apps
- Kubernetes (spark_on_kube)
- Databricks (spark_on_databricks)
- EMR (spark_on_emr)
- locally (spark)
SFTP (read/write data from SFTP servers)
S3Copy (copy data from S3)
S3Push (push local data to S3)
Arbitrary shell script execution

Task have configuration(task_config) which will be fed to application as Map[String,String]

more of these component documentation and sample configuration properties can be found at Utils library.

Any of the sensitive values can be fed to application with config_type to SSM which will be retrieved from AWS-Systems manager parameter store.

Job

Job is collection of Tasks which will be stitched together to run the apps

There is a relation between job and task (job_task_order) which describe how the tasks should be sequenced, this also capability to enable and disable some of the tasks.

job also have configuration(job_config) which will be applied to all tasks for the job.

this validates whether the previous instance is success or not in case if failure it holds of current run(of course you can overwrite) to avid history data issue.

Global Configuration

Applied to all the jobs in a framework (common configurations)

precedence for configuration wold be task(highest) --> job --> glabal which gives flexibility to override a global config at task level.

Application logging

All the log metadata captured at meta_audit store where each execution job(job_execution),task(task_execution) audits get loaded(start time, end time, end status).

App Logs

when application push to target system (kubernetes/databricks,EMR) it hangs on to it and get back logs.

Application Properties

Basic starting application properties to start.

any of the sensitive values can be upload into SSM and refer with secret. where application try to retrieve from amazon systems manager parameter store.

jdbcHostname=<>
jdbcPort=5432
jdbcDatabase=<>
jdbcUsername=<>
secret.jdbcPassword=<SSM location>
#jdbcPassword=<not a good way to store locally>

any of these parameters specified here have equivalent precedence same as global config.

Usage

Usage: pipeline [-ehrV] -d=<orderDateTime> -j=<jobId> -p=<properties>
Pipeline to run user applications in sequence example:
  -d, --ordertimestamp=<orderDateTime>
                        Job Order date must be specified in YYYY-mm-dd or
                          YYYY-mm-DDTHH:MM:SS[Z]
  -e, --endjobstatusandrun and run
                        Set the previous job execution status to END and run the job
  -h, --help            Show this help message and exit.
  -j, --jobid=<jobId>   jobId to be executed
  -p, --properties=<properties>
                        properties for input application
  -r, --restartjobstatusandrun and rerun
                        Set the previous job execution status to RERUN and run the
                          job
  -V, --version         Print version information and exit.

#ex:
bin/pipeline  -j <job_id> -d <job_order_timestamp> -p <app_properties> -r

Alternatively fat jar can be used to Run the same 

java -jar piplien_2.12_<version>-all.jar  -j <job_id> -d <job_order_timestamp> -p <app_properties> -r

Where can I get the latest release?

You can get source from SCM.

Alternatively you can pull binaries from the central Maven repositories: For mvn:

<dependency>
  <groupId>io.github.bkosaraju</groupId>
  <artifactId>pipeline_#ScalaVariant#</artifactId>
  <version>#Version#</version>
</dependency>
 
<!--Fat/ Assembly Jar-->
<dependency>
  <groupId>io.github.bkosaraju</groupId>
  <artifactId>pipeliene_#ScalaVariant#</artifactId>
  <version>#verion#</version>
  <classifier>all</classifier>
</dependency>

for Gradle:

    api group: "io.github.bkosaraju", name: "pipeline_$scalaVariant", version: "$Version"

Build Instructions

./gradlew clean build

#Artifacts can be found in build/lib directory 

#Linux/Windows binaries can be found at build/distribution directory

Scala Docs

Contributing

Please feel free to raise a pull request in case if you feel like something can be updated or contributed

License

Apache

Version
1.01 Oct 5, 2020
1.0 Oct 5, 2020

Version

1.01
Oct 5, 2020

1.0
Oct 5, 2020

pipeline

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download pipeline_2.12

Dependencies

compile (10)

Project Modules

Pipeline

Task

Job

Global Configuration

Application logging

App Logs

Application Properties

Usage

Where can I get the latest release?

Build Instructions

Scala Docs

Contributing

License

Versions