kafka-connect-arangodb

Kafka Connect Sink Connector for ArangoDB

License

License

GroupId

GroupId

io.github.jaredpetersen
ArtifactId

ArtifactId

kafka-connect-arangodb
Last Version

Last Version

1.0.7
Release Date

Release Date

Type

Type

jar
Description

Description

kafka-connect-arangodb
Kafka Connect Sink Connector for ArangoDB
Project URL

Project URL

https://github.com/jaredpetersen/kafka-connect-arangodb
Source Code Management

Source Code Management

https://github.com/jaredpetersen/kafka-connect-arangodb

Download kafka-connect-arangodb

How to add to project

<!-- https://jarcasting.com/artifacts/io.github.jaredpetersen/kafka-connect-arangodb/ -->
<dependency>
    <groupId>io.github.jaredpetersen</groupId>
    <artifactId>kafka-connect-arangodb</artifactId>
    <version>1.0.7</version>
</dependency>
// https://jarcasting.com/artifacts/io.github.jaredpetersen/kafka-connect-arangodb/
implementation 'io.github.jaredpetersen:kafka-connect-arangodb:1.0.7'
// https://jarcasting.com/artifacts/io.github.jaredpetersen/kafka-connect-arangodb/
implementation ("io.github.jaredpetersen:kafka-connect-arangodb:1.0.7")
'io.github.jaredpetersen:kafka-connect-arangodb:jar:1.0.7'
<dependency org="io.github.jaredpetersen" name="kafka-connect-arangodb" rev="1.0.7">
  <artifact name="kafka-connect-arangodb" type="jar" />
</dependency>
@Grapes(
@Grab(group='io.github.jaredpetersen', module='kafka-connect-arangodb', version='1.0.7')
)
libraryDependencies += "io.github.jaredpetersen" % "kafka-connect-arangodb" % "1.0.7"
[io.github.jaredpetersen/kafka-connect-arangodb "1.0.7"]

Dependencies

compile (1)

Group / Artifact Type Version
com.arangodb : arangodb-java-driver jar 6.3.0

provided (2)

Group / Artifact Type Version
org.apache.kafka : connect-api jar 2.4.0
org.apache.kafka : connect-json jar 2.4.0

test (4)

Group / Artifact Type Version
org.slf4j : slf4j-nop jar 1.7.28
org.junit.jupiter : junit-jupiter-api jar 5.6.0
org.junit.jupiter : junit-jupiter-engine jar 5.6.0
org.mockito : mockito-core jar 3.2.4

Project Modules

There are no modules declared in this project.

Kafka Connect ArangoDB Connector

Build Status Maven Central

Kafka Connect Sink Connector for ArangoDB

Usage

Kafka Connect ArangoDB is a Kafka Connector that translates record data into REPSERT and DELETE queries that are performed against ArangoDB. Only sinking data is supported at this time.

Requires ArangoDB 3.4 or higher.

A full example of how Kafka Connect ArangoDB can be integrated into a Kafka cluster is available in the development documentation.

Record Formats and Structures

The following record formats are supported:

  • Avro (Recommended)
  • JSON with Schema
  • Plain JSON

With each of these formats, the record value can be structured in one of the following ways:

  • Simple (Default)
  • Change Data Capture

Simple

The Simple format is a slim record value structure that only provides the information necessary for writing to the database.

When written as plain JSON, the record value looks something like:

{
  "someKey": "changed value",
  "otherKey": true
}

When the Kafka Connect ArangoDB Connector receives records adhering to this format, it will translate it into the following ArangoDB database changes and perform them:

  • Records with a null value are "tombstone" records and will result in the deletion of the document from the database
  • Records with a non-null value will be repserted (replace the full document if it exists already, insert the full document if it does not)

This format is the default, so no extra configuration is required.

CDC

The CDC format is a record value structure that is designed to handle records produced by Change Data Capture systems like Debezium. Each record value should be an object with properties before and after that store the "before" and "after" state of the document, respectively.

When written as plain JSON, the record value looks something like:

{
  "before": {
    "someKey": "some value",
    "otherKey": false
  },
  "after": {
    "someKey": "changed value",
    "otherKey": true
  }
}

When the Kafka Connect ArangoDB Connector receives this data, it will translate it into the following ArangoDB database changes and perform them:

  • Records with a null value are "tombstone" records and are are ignored
  • Record with a non-null value and a null value for after will result in the deletion of the document
  • Records with a non-null value and a non-null value for after will be repserted (replace the full document if it exists already, insert the full document if it does not)

To use this record format, configure it as a Kafka Connect Single Message Transformation in the connector's config:

{
  . . .
  "transforms": "cdc",
  "transforms.cdc.type": "io.github.jaredpetersen.kafkaconnectarangodb.sink.transforms.Cdc"
}

Topics

The name of the topic determines the name of the collection the record will be written to.

Record with topics that are just a plain string like products will go into a collection with the name products. If the record's topic name is period-separated like dbserver1.mydatabase.customers, the last period-separated value will be the collection's name (customers in this case). Each configured Kafka Connect ArangoDB Connector will only output data into a single database instance.

Foreign Keys and Edge Collections

In most situations, the record values that you will want to sink into ArangoDB is not in a format that ArangoDB can use effectively. ArangoDB has it's own format for foreign keys ({ "foreignKey": "MyCollection/1234" }) and edges between vertices ({ "_from": "MyCollection/1234", "_to": "MyCollection/5678" }) that your input data likely doesn't implement by default. It is recommended that you build your own custom Kafka Streams application to perform these mappings.

Configuration

Connector Properties

Name Description Type Default Importance
arangodb.host ArangoDB server host. string high
arangodb.port ArangoDB server host port number. int high
arangodb.user ArangoDB connection username. string high
arangodb.password ArangoDB connection password. password "" high
arangodb.useSsl ArangoDB use SSL connection. boolean false high
arangodb.database.name ArangoDB database name. string high

Single Message Transformations

Type Description
io.github.jaredpetersen.kafkaconnectarangodb.sink.transforms.Cdc Converts records from CDC format to Simple format.

Versions

Version
1.0.7
1.0.6
1.0.5
1.0.4
1.0.3
1.0.2
1.0.1
1.0.0