Simple Schema

A library provides a more easy way to describe DataFrame schema for Spark and [MLSQL](http://www.mlsql.tech).

License

License

GroupId

GroupId

tech.mlsql
ArtifactId

ArtifactId

simple-schema_2.12
Last Version

Last Version

0.2.0
Release Date

Release Date

Type

Type

jar
Description

Description

Simple Schema
A library provides a more easy way to describe DataFrame schema for Spark and [MLSQL](http://www.mlsql.tech).
Project URL

Project URL

https://github.com/allwefantasy/simple-schema.git
Source Code Management

Source Code Management

https://github.com/allwefantasy/simple-schema

Download simple-schema_2.12

How to add to project

<!-- https://jarcasting.com/artifacts/tech.mlsql/simple-schema_2.12/ -->
<dependency>
    <groupId>tech.mlsql</groupId>
    <artifactId>simple-schema_2.12</artifactId>
    <version>0.2.0</version>
</dependency>
// https://jarcasting.com/artifacts/tech.mlsql/simple-schema_2.12/
implementation 'tech.mlsql:simple-schema_2.12:0.2.0'
// https://jarcasting.com/artifacts/tech.mlsql/simple-schema_2.12/
implementation ("tech.mlsql:simple-schema_2.12:0.2.0")
'tech.mlsql:simple-schema_2.12:jar:0.2.0'
<dependency org="tech.mlsql" name="simple-schema_2.12" rev="0.2.0">
  <artifact name="simple-schema_2.12" type="jar" />
</dependency>
@Grapes(
@Grab(group='tech.mlsql', module='simple-schema_2.12', version='0.2.0')
)
libraryDependencies += "tech.mlsql" % "simple-schema_2.12" % "0.2.0"
[tech.mlsql/simple-schema_2.12 "0.2.0"]

Dependencies

provided (4)

Group / Artifact Type Version
org.apache.spark : spark-core_2.11 jar 2.4.3
org.apache.spark : spark-sql_2.11 jar 2.4.3
org.apache.spark : spark-mllib_2.11 jar 2.4.3
org.apache.spark : spark-graphx_2.11 jar 2.4.3

test (6)

Group / Artifact Type Version
org.scalactic : scalactic_2.11 jar 3.0.0
org.scalatest : scalatest_2.11 jar 3.0.0
org.pegdown : pegdown jar 1.6.0
org.apache.spark : spark-catalyst_2.11 jar 2.4.3
org.apache.spark : spark-core_2.11 jar 2.4.3
org.apache.spark : spark-sql_2.11 jar 2.4.3

Project Modules

There are no modules declared in this project.

Simple Schema

A library provides a more easy way to describe DataFrame schema for Spark and MLSQL.

Requirements

This library requires Spark 2.3+/2.4+ (tested).

Liking

You can link against this library in your program at the following coordinates:

Scala 2.11

groupId: tech.mlsql
artifactId: simple-schema_2.11
version: 0.2.0

Usage

val s = SparkSimpleSchemaParser.parse("st(field(column1,string),field(column2,string),field(column3,string))")
assert(s == StructType(Seq(StructField("column1", StringType), StructField("column2", StringType), StructField("column3", StringType))))

Spark DataFrame schema normally is represented by json, but json is not easy to write and used as plain-text in quote. Simple schema create a new format to make this easy.

st means StructType, filed means StructField,the first value in field is columnName,and the second is type. For now, simple schema supports type like following:

  1. st
  2. field
  3. string
  4. float
  5. double
  6. integer
  7. short
  8. date
  9. binary
  10. map
  11. array

Suppose you have a json data:

{"column1":{"key":"value"}}

you can describe it like this:

st(field(column1,map(string,string)))

st also supports nesting:

st(field(column1,map(string,array(st(field(columnx,string))))))

Versions

Version
0.2.0