arc-jupyter


License

License

MIT
GroupId

GroupId

ai.tripl
ArtifactId

ArtifactId

arc-jupyter_2.11
Last Version

Last Version

2.5.0
Release Date

Release Date

Type

Type

jar
Description

Description

arc-jupyter
arc-jupyter
Project URL

Project URL

https://arc.tripl.ai
Project Organization

Project Organization

ai.tripl
Source Code Management

Source Code Management

https://github.com/tripl-ai/arc-jupyter

Download arc-jupyter_2.11

How to add to project

<!-- https://jarcasting.com/artifacts/ai.tripl/arc-jupyter_2.11/ -->
<dependency>
    <groupId>ai.tripl</groupId>
    <artifactId>arc-jupyter_2.11</artifactId>
    <version>2.5.0</version>
</dependency>
// https://jarcasting.com/artifacts/ai.tripl/arc-jupyter_2.11/
implementation 'ai.tripl:arc-jupyter_2.11:2.5.0'
// https://jarcasting.com/artifacts/ai.tripl/arc-jupyter_2.11/
implementation ("ai.tripl:arc-jupyter_2.11:2.5.0")
'ai.tripl:arc-jupyter_2.11:jar:2.5.0'
<dependency org="ai.tripl" name="arc-jupyter_2.11" rev="2.5.0">
  <artifact name="arc-jupyter_2.11" type="jar" />
</dependency>
@Grapes(
@Grab(group='ai.tripl', module='arc-jupyter_2.11', version='2.5.0')
)
libraryDependencies += "ai.tripl" % "arc-jupyter_2.11" % "2.5.0"
[ai.tripl/arc-jupyter_2.11 "2.5.0"]

Dependencies

compile (3)

Group / Artifact Type Version
org.scala-lang : scala-library jar 2.11.12
sh.almond : kernel_2.11 jar 0.6.0
com.github.alexarchambault : case-app_2.11 jar 2.0.0-M9

provided (5)

Group / Artifact Type Version
org.apache.spark : spark-core_2.11 jar 2.4.5
org.apache.spark : spark-sql_2.11 jar 2.4.5
org.apache.spark : spark-hive_2.11 jar 2.4.5
org.apache.spark : spark-mllib_2.11 jar 2.4.5
ai.tripl : arc_2.11 jar 2.14.0

Project Modules

There are no modules declared in this project.

Arc-Jupyter is an interactive Jupyter Notebooks Extenstion for building Arc data pipelines via Jupyter Notebooks.

How to use

The only thing that needs to be configured is the Java Virtual Machine memory allocation which should be configured for your specific environment. e.g. to set to 4 Gigabytes:

-e JAVA_OPTS="-Xmx4096m" \

Here is the docker run command which exposes the Jupyter Notebook port (8888) and the Spark UI port (4040):

docker run \
-it \
--rm \
-e JAVA_OPTS="-Xmx8192m" \
--name arc-jupyter \
-p 4040:4040 \
-p 8888:8888 \
triplai/arc-jupyter:latest

Additional Configurations

To set addtional Spark configuration variables create an environemtn environment variable starting with conf_ and replace the . with _ e.g. conf_spark_sql_inMemoryColumnarStorage_compressed to set spark.sql.inMemoryColumnarStorage.compressed (case sensitive).

Hadoop configurations can be set similarly:

conf_spark_hadoop_fs_s3a_aws_credentials_provider=com.amazonaws.auth.InstanceProfileCredentialsProvider

Capabilities

Magic Description Scala 2.11 Scala 2.12 numRows truncate outputView persist
%help Display this help informaion.
%arc Execute an Arc stage. Default.
%conf Set configuration. Default master=local[*], numRows=20, truncate=50
%env Set job variables via the notebook (e.g. %env ETL_CONF_KEY0=value0 ETL_CONF_KEY1=value1)
%metadata Returns the metadata of an input view as a resultset.
%printmetadata Prints the Arc metadata JSON for the input view.
%printschema Prints the Spark schema for the input view as text.
%schema Prints the Spark schema for the input view.
%sql Execute a SQL query and return resultset.
%version Prints the version information of Arc Jupyter.
  • numRows defines the number of rows to return in a result table.
  • truncate defines the maximum number of characters displayed in a single result cell.
  • outputView defines the name of a temporary view to register of the resultset.

Example

This example shows how to use the numRows, truncate and outputView options:

%sql numRows=10 truncate=100 outputView=green_tripdata0
SELECT *
FROM green_tripdata0_raw
WHERE fare_amount < 10

Authors/Contributors

License

Arc-Jupyter is released under the MIT License.

Project build with Almond BSD 3-Clause "New" or "Revised" License.

ai.tripl

tripl.ai

Versions

Version
2.5.0
2.4.2
2.4.1
2.4.0
2.3.3
2.3.2
2.3.1
2.3.0
2.2.0
2.1.1
2.1.0
2.0.3
2.0.2
2.0.1
2.0.0
1.10.0
1.9.3
1.9.2
1.9.1
1.9.0
1.8.1
1.8.0
1.7.1
1.7.0
1.6.1
1.6.0
1.5.0
1.4.0
1.3.0
1.2.0
1.1.0
1.0.0
0.0.14
0.0.13
0.0.12