avro-sql_2.11

Allows to run SQL over an Avro record to morph its structure

License	License Apache License 2.0
GroupId	GroupId com.landoop
ArtifactId	ArtifactId avro-sql_2.11
Last Version	Last Version 1.0.1
Release Date	Release Date Jan 12, 2018
Type	Type jar
Description	Description avro-sql_2.11 Allows to run SQL over an Avro record to morph its structure
Project URL	Project URL https://github.com/landoop/avro-sql
Source Code Management	Source Code Management https://github.com/landoop/avro-sql.git

Download avro-sql_2.11

Filename	Size
avro-sql_2.11-1.0.1.pom
avro-sql_2.11-1.0.1.jar	166 KB
avro-sql_2.11-1.0.1-sources.jar	10 KB
avro-sql_2.11-1.0.1-scaladoc.jar	362 KB
avro-sql_2.11-1.0.1-javadoc.jar	261 bytes
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.landoop/avro-sql_2.11/ -->
<dependency>
    <groupId>com.landoop</groupId>
    <artifactId>avro-sql_2.11</artifactId>
    <version>1.0.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.landoop/avro-sql_2.11/
implementation 'com.landoop:avro-sql_2.11:1.0.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.landoop/avro-sql_2.11/
implementation ("com.landoop:avro-sql_2.11:1.0.1")

Apache Buildr

'com.landoop:avro-sql_2.11:jar:1.0.1'

Apache Ivy

<dependency org="com.landoop" name="avro-sql_2.11" rev="1.0.1">
  <artifact name="avro-sql_2.11" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.landoop', module='avro-sql_2.11', version='1.0.1')
)

Scala SBT

libraryDependencies += "com.landoop" % "avro-sql_2.11" % "1.0.1"

Leiningen

[com.landoop/avro-sql_2.11 "1.0.1"]

Dependencies

compile (6)

Group / Artifact	Type	Version
org.scala-lang : scala-library	jar	2.11.11
com.typesafe.scala-logging : scala-logging_2.11	jar	3.5.0
com.landoop : sql-core_2.11	jar	1.0
io.confluent » kafka-avro-serializer	jar	4.0.0
com.sksamuel.avro4s : avro4s-core_2.11	jar	1.6.2
org.apache.calcite : calcite-core	jar	1.12.0

test (7)

Group / Artifact	Type	Version
org.mockito : mockito-core	jar	2.7.13
org.scalacheck : scalacheck_2.11	jar	1.11.1
org.scalatest : scalatest_2.11	jar	2.2.6
junit : junit	jar	4.12
org.apache.curator : curator-test	jar	3.1.0
org.powermock : powermock-module-junit4	jar	1.6.5
org.pegdown : pegdown	jar	1.1.0

Project Modules

There are no modules declared in this project.

Avro-Sql

This is a library allowing to transform the shape of an Avro record using SQL. It relies on Apache Calcite for the SQL parsing.

import AvroSql._
val record: GenericRecord = {...}
record.scql("SELECT name, address.street.name as streetName")

As simple as that!

Let's say we have the following Avro Schema:

{
  "type": "record",
  "name": "Pizza",
  "namespace": "com.landoop.sql.avro",
  "fields": [
    {
      "name": "ingredients",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Ingredient",
          "fields": [
            {
              "name": "name",
              "type": "string"
            },
            {
              "name": "sugar",
              "type": "double"
            },
            {
              "name": "fat",
              "type": "double"
            }
          ]
        }
      }
    },
    {
      "name": "vegetarian",
      "type": "boolean"
    },
    {
      "name": "vegan",
      "type": "boolean"
    },
    {
      "name": "calories",
      "type": "int"
    },
    {
      "name": "fieldName",
      "type": "string"
    }
  ]
}

using the library one can apply to types of queries:

to flatten it
to retain the structure while cherry-picking and/or rename fields The difference between the two is marked by the withstructure* keyword. If this is missing you will end up flattening the structure.

Let's take a look at the flatten first. There are cases when you are receiving a nested avro structure and you want to flatten the structure while being able to cherry pick the fields and rename them. Imagine we have the following Avro schema:

{
  "type": "record",
  "name": "Person",
  "namespace": "com.landoop.sql.avro",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "address",
      "type": {
        "type": "record",
        "name": "Address",
        "fields": [
          {
            "name": "street",
            "type": {
              "type": "record",
              "name": "Street",
              "fields": [
                {
                  "name": "name",
                  "type": "string"
                }
              ]
            }
          },
          {
            "name": "street2",
            "type": [
              "null",
              "Street"
            ]
          },
          {
            "name": "city",
            "type": "string"
          },
          {
            "name": "state",
            "type": "string"
          },
          {
            "name": "zip",
            "type": "string"
          },
          {
            "name": "country",
            "type": "string"
          }
        ]
      }
    }
  ]
}

Applying this SQL like syntax

SELECT 
    name, 
    address.street.*, 
    address.street2.name as streetName2 
FROM topic

the projected new schema is:

{
  "type": "record",
  "name": "Person",
  "namespace": "com.landoop.sql.avro",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "name_1",
      "type": "string"
    },
    {
      "name": "streetName2",
      "type": "string"
    }
  ]
}

There are scenarios where you might want to rename fields and maybe reorder them. By applying this SQL like syntax on the Pizza schema

SELECT 
       name, 
       ingredients.name as fieldName, 
       ingredients.sugar as fieldSugar, 
       ingredients.*, 
       calories as cals 
withstructure

we end up projecting the first structure into this one:

{
  "type": "record",
  "name": "Pizza",
  "namespace": "com.landoop.sql.avro",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "ingredients",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "Ingredient",
          "fields": [
            {
              "name": "fieldName",
              "type": "string"
            },
            {
              "name": "fieldSugar",
              "type": "double"
            },
            {
              "name": "fat",
              "type": "double"
            }
          ]
        }
      }
    },
    {
      "name": "cals",
      "type": "int"
    }
  ]
}

Flatten rules

you can't flatten a schema containing array fields
when flattening and the column name has already been used it will get a index appended. For example if field name appears twice and you don't specifically rename the second instance (name as renamedName) the new schema will end up containing: name and name_1

How to use it

import AvroSql._
val record: GenericRecord = {...}
record.scql("SELECT name, address.street.name as streetName")

As simple as that!

Query Examples

You can find more examples in the unit tests, however here are a few used:

flattening

//rename and only pick fields on first level
SELECT calories as C ,vegan as V ,name as fieldName FROM topic

//Cherry pick fields on different levels in the structure
SELECT name, address.street.name as streetName FROM topic

//Select and rename fields on nested level
SELECT name, address.street.*, address.street2.name as streetName2 FROM topic

retaining the structure

//you can select itself - obviousely no real gain on this
SELECT * FROM topic withstructure 

//rename a field 
SELECT *, name as fieldName FROM topic withstructure

//rename a complex field
SELECT *, ingredients as stuff FROM topic withstructure

//select a single field
SELECT vegan FROM topic withstructure

//rename and only select nested fields
SELECT ingredients.name as fieldName, ingredients.sugar as fieldSugar, ingredients.* FROM topic withstructure

Release Notes

0.1 (2017-05-03)

first release

Building

Requires gradle 3.4.1 to build.

To build

gradle compile

To test

gradle test

You can also use the gradle wrapper

./gradlew build

To view dependency trees

gradle dependencies #

Lenses.io

Please visit https://github.com/lensesio for Lenses.io's repositories

Versions

Version
1.0.1 Jan 12, 2018
1.0 Jan 10, 2018
0.3 Apr 21, 2017

avro-sql_2.11

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management