Lenses for Apache Kafka
Lenses offers SQL (for data browsing and Kafka Streams), Kafka Connect connector management, cluster monitoring and more.
You can find more on lenses.io
Stream Reactor
A collection of components to build a real time ingestion pipeline.
Kafka Compatibility
- Kafka 2.5+ (Confluent 5.5) - Stream reactor 2.0.0+
- Kafka 2.0 -> 2.4 (Confluent 5.4) - Stream reactor 1.2.7
Connectors
Please take a moment and read the documentation and make sure the software prerequisites are met!!
Connector | Type | Description | Docs |
---|---|---|---|
AWS S3 | Sink | Copy data from Kafka to AWS S3. | Docs |
AzureDocumentDb | Sink | Copy data from Kafka and Azure Document Db. | Docs |
Bloomberg | Source | Copy data from Bloomberg streams and Kafka. | Docs |
Cassandra | Source | Copy data from Cassandra and Kafka. | Docs |
*Cassandra | Sink | Certified DSE Cassandra, copy data from Kafka to Cassandra. | Docs |
Coap | Source | Copy data from IoT Coap endpoints (using Californium) to Kafka. | Docs |
Coap | Sink | Copy data from Kafka to IoT Coap endpoints. | Docs |
Elastic 6 | Sink | Copy data from Kafka to Elastic Search 6.x w. tcp or http | Docs |
FTP/HTTP | Source | Copy data from FTP/HTTP to Kafka. | Docs |
Hazelcast | Sink | Copy data from Kafka to Hazelcast. | Docs |
HBase | Sink | Copy data from Kafka to HBase. | Docs |
Hive | Source | Copy data from Hive/HDFS to Kafka. | Docs |
Hive | Sink | Copy data from Kafka to Hive/HDFS | Docs |
InfluxDb | Sink | Copy data from Kafka to InfluxDb. | Docs |
Kudu | Sink | Copy data from Kafka to Kudu. | Docs |
JMS | Source | Copy data from JMS topics/queues to Kafka. | Docs |
JMS | Sink | Copy data from Kafka to JMS. | Docs |
MongoDB | Sink | Copy data from Kafka to MongoDB. | Docs |
MQTT | Source | Copy data from MQTT to Kafka. | Docs |
MQTT | Sink | Copy data from Kafka to MQTT. | Docs |
Pulsar | Source | Copy data from Pulsar to Kafka. | Docs |
Pulsar | Sink | Copy data from Kafka to Pulsar. | Docs |
Redis | Sink | Copy data from Kafka to Redis. | Docs |
ReThinkDB | Source | Copy data from RethinkDb to Kafka. | Docs |
ReThinkDB | Sink | Copy data from Kafka to RethinkDb. | Docs |
VoltDB | Sink | Copy data from Kafka to Voltdb. | Docs |
Release Notes
2.1.3
Move to connect-common 2.0.5 that adds complex type support to KCQL
2.1.2
- AWS S3 Sink Connector
- Prevent null pointer exception in converters when maps are presented will null values
- Offset reader optimisation to reduce S3 load
- Ensuring that commit only occurs after the preconfigured time interval when using WITH_FLUSH_INTERVAL
- AWS S3 Source Connector (New Connector)
- Cassandra Source Connector
- Add Bucket Timeseries Mode
- Reduction of logging noise
- Proper handling of uninitialized connections on task stop()
- Elasticsearch Sink Connector
- Update default port
- Hive Sink
- Improve Orc format handling
- Fixing issues with partitioning by non-string keys
- Hive Source
- Ensuring newly written files can be read by the hive connector by introduction of a refresh frequency configuration option.
- Redis Sink
- Correct Redis writer initialisation
2.1.0
- AWS S3 Sink Connector
- Elasticsearch 7 Support
2.0.1
-
Hive Source
- Rename option
connect.hive.hive.metastore
toconnect.hive.metastore
- Rename option
connect.hive.hive.metastore.uris
toconnect.hive.metastore.uris
- Rename option
-
Fix Elastic start up NPE
-
Fix to correct batch size extraction from KCQL on Pulsar
2.0.0
- Move to Scala 2.12
- Move to Kafka 2.4.1 and Confluent 5.4
Deprecated: * Druid Sink (not scala 2.12 compatible) * Elastic Sink (not scala 2.12 compatible) * Elastic5 Sink(not scala 2.12 compatible) * RabbitMQ (not support and JMS connector can be used)
-
Redis
- Add support for Redis Streams
-
Cassandra
- Add support for setting the LoadBalancer policy on the Cassandra Sink
-
ReThinkDB
- Use SSL connection on Rethink initialize tables is ssl set
-
FTP Source
- Respect "connect.ftp.max.poll.records" when reading slices
-
MQTT Source
- Allow lookup of avro schema files with wildcard subscriptions
1.2.7
Features
-
MQTT Source
Support dynamic topic names in Kafka from a wildcard subscription.
Example: INSERT INTO
$
SELECT * FROM /mqttSourceTopic/+/testIf the MQTT topic is /mqttSourceTopic/A/test this Will result in topics in kafka mqttSourceTopic_A_test
-
Cassandra (source)
-
Support for sending JSON formatted message (with string key) to kafka topic.
Sample KCQL would be like:
INSERT INTO <topic> SELECT <fields> FROM <column_family> PK <PK_field> WITHFORMAT JSON WITHUNWRAP INCREMENTALMODE=<mode> WITHKEY(<key_field>)
This would send field's values as JSON object to the said topic.
Note that in kafka connect properties one needs to set
key.converter
andvalue.converter
asorg.apache.kafka.connect.storage.StringConverter
-
Added a new INCREMENTALMODE called dsesearchtimestamp that will make a DSE Search queries using Solr instead of a native Cassandra query.
Instead of the native query:
SELECT a, b, c, d FROM keyspace.table WHERE pkCol > ? AND pkCol <= ? ALLOW FILTERING; We will have now the query with Solr on the dsesearchtimestamp INCREMENTALMODE:
SELECT a, b, c, d FROM keyspace.table WHERE solr_query=?; Where the solr_query will be something like this:
pkCol:{2020-03-23T15:02:21Z TO 2020-03-23T15:30:12.989Z]
-
-
AzureDocumentDB
- Move to version 2.x since 1.x is deprecated in May 2020
Bug fixes
-
JMS Source
Allow for tasks parallelization and how the connector tasks parallelization is decided.
Changes:
- Allow the connector to respect the
tasks.max
value provided if the userconnect.jms.scale.type
. Available values arekcql
anddefault
. IfKCQL
is provided it will be based on the number of KCQL statements written, otherwise it will be driven based on the connectortasks.max
- Allow the connector to respect the
-
Kudu Sink
Handle null decimal types correctly
-
Mongo Sink
Handle decimal types
1.2.4 Bug fixes
-
JMS Source
Ack the JMS messages was not always possible. Also there was an issue with producing the messages to Kafka out of order from the JMS queue. Changes:
- Queue messages order are retained when published to Kafka (although they might be routed to different partitions)
- Ack happens for each message. This is a change from previous behaviour.
- Records which fail to be committed to Kafka are not ack-ed on JMS side
1.2.3 Features
- Influx
- Support for referencing _key values
- Support Unix timestamp as double
- MQTT
- Replicate shared subscription to all tasks
- Add sink config to specify retained messages
- Add a config to specify retained messages
- Hazelcast
- SSL support
- MongoDB
- SSL support
- Removing database name dashes restriction
- FTP
- FTPS support
Bug fixes
- Hive
- Fix for writing nested structures to Hive
- Improves the code for the async function call to use the CAS
1.2.2 Features
- Redis
- TTL Support
- SSL support
- AWS ElasticCache support
- GEOADD support
- PUB/SUB support
- MQTT
- Multi server connection
- Dynamic Target support
- Hive
- Kerberos support
- Kudu
- Comma separated master endpoints
Bug fixes
- Redis
- Topic regex
- JMS
- Filters out Kafka records with null value
- Cassandra
- Timestamp comparison
- PK check for incremental
- Mongo
- Memory leak
1.2.1
- Fixed Set support on the Cassandra source connector
- Support Array type in InfluxDB connector
- Fixed records out of order when insert on the Kudu sink connector
- Upgrade to kafka 2.1.0
- Added support for custom delimiter in composite primary keys on the Redis sink connector
1.2.0
- Upgrade to Kafka 2.0
- New Hive source and sink connector supporting Avro, Parquet and ORC
- Fix on NPE for Redis multiple sorted sets
- Fixed setting Mongo primary _id field in upsert mode
- Fix on handling multiple topics in Redis sort set
- Fixed mongodb sink exception when PK is compound key
- Fixed JMS sink with password is not working, wrong context
- Fixed handling multiple primary keys for sorted sets
- Fixed Kudu sink autocreate adding unnecessary partition
- Fixed Avro field with default value does not create table in Kudu
- Fixed Kudu Connector Can Not AutoCreate Table from Sink Record
- Fixed JMS sink session rollback exception if session is closed
1.1.0
- Upgrade to Kafka 1.1.0
- Added SSL, subscription, partitioning, batching and key selection to Pulsar source and sink
- Elastic6 connector @caiooliveiraeti !
- HTTP Basic Auth for Elasticsearch http client thanks @justinsoong !
- Add polling timeout on the JMS source connector to avoid high CPU in the source connector poll thanks #373 @matthedude
- Fixes on the elastic primary key separator thanks @caiooliveiraeti!
- Fix on the MQTT class loader
- Fix on the JMS class loader
- Fix on JMS to close down connections cleanly #363 thanks @matthedude!
- Fix on MQTT to correctly handle authentication
- Moved MongoDB batch size to KCQL.
connect.mongodb.batch.size
is deprecated - Added
connect.mapping.collection.to.json
to treat maps, list, sets as json when inserting into Cassandra - Added support for Elastic Pipelines thanks @caiooliveiraeti!
- Moved ReThinkDB batch size to KCQL
connect.rethink.batch.size
is deprecated - MQTT source allows full control of matching the topic
INSERT INTO targetTopic SELECT * FROM mqttTopic ... WITHREGEX=`$THE_REGEX`
- Upgrade Kudu Client to 0.7
- Upgrade Azure documentDB client to 1.16.0
- Upgrade Elastic5 to elastic4s 5.6.5
- Upgrade Elastic6 to elastic4s 6.2.5
- Upgrade Hazelcast client to 3.10
- Upgrade InfluxDB client to 2.9
- Upgrade MongoDB client to 3.6.3
- Upgrade Redis client to 2.9
- Kudu connector now accepts a comma separated list of master addresses
- Added missing
connect.elastic.retry.interval
to elastic5 and elastic6 - Added a default value set property to Cassandra to allow
DEFAULT UNSET
to be added on insert. Omitted columns from maps default to null. Alternatively, if setUNSET
, pre-existing value will be preserved - Cassandra source batch size now in KCQL.
connect.cassandra.batch.size
is deprecated .
1.0.0
- Kafka 1.0.0 Support
0.4.0
- Add FTPS support to FTP connector, new configuration option
ftp.protocol
introduced, either ftp (default) or ftps. - Fix for MQTT source High CPU Thanks @masahirom!
- Improve logging on Kudu
- DELETE functionality add to the Cassandra sink, deletion now possible for null payloads, thanks @sandonjacobs !
- Fix in kafka-connect-common to handle primary keys with doc strings thanks, @medvekoma !
- Fix writing multiple topics to the same table in Cassandra #284
- Upgrade to Cassandra driver 3.3.0 and refactor Cassandra tests
- Fix on JMS source transacted queues #285 thanks @matthedude !
- Fix on Cassandra source, configurable timespan queries. You can now control the timespan the Connector will query for
- Allow setting initial query timestamp on Cassandra source
- Allow multiple primary keys on the redis sink
0.3.0
- Upgrade CoAP to 2.0.0-M4
- Upgrade to Confluent 3.3 and Kafka 0.11.0.0.
- Added MQTT Sink.
- Add MQTT wildcard support.
- Upgrade CoAP to 2.0.0-M4.
- Added WITHCONVERTERS and WITHTYPE to JMS and MQTT connectors in KCQL to simplify configuration.
- Added FLUSH MODE to Kudu. Thanks! @patsak
0.2.6
Features
- Added MQTT Sink
- Upgrade to Confluent 3.2.2
- Upgrade to KCQL 2x
- Add CQL generator to Cassandra source
- Add KCQL INCREMENTALMODE support to the Cassandra source, bulk mode and the timestamp column type is now take from KCQL
- Support for setting key and truststore type on Cassandra connectors
- Added token based paging support for Cassandra source
- Added default bytes converter to JMS Source
- Added default connection factory to JMS Source
- Added support for SharedDurableConsumers to JMS Connectors
- Upgraded JMS Connector to JMS 2.0
- Moved to Elastic4s 2.4
- Added Elastic5s with TCP, TCP+XPACK and HTTP client support
- Upgrade Azure Documentdb to 1.11.0
- Added optional progress counter to all connectors, it can be enabled with
connect.progress.enabled
which will periodically report log messages processed - Added authentication and TLS to ReThink Connectors
- Added TLS support for ReThinkDB, add batch size option to source for draining the internal queues.
- Upgrade Kudu Client to 1.4.0
- Support for dates in Elastic Indexes and custom document types
- Upgrade Connect CLI to 1.0.2 (Renamed to connect-cli)
Bug Fixes
- Fixes for high CPU on CoAP source
- Fixes for high CPU on Cassandra source
- Fixed Avro double fields mapping to Kudu columns
- Fixes on JMS properties converter, Invalid schema when extracting properties
Misc
- Refactored Cassandra Tests to use only one embedded instance
- Removed unused batch size and bucket size options from Kudu, they are taken from KCQL
- Removed unused batch size option from DocumentDb
- Rename Azure DocumentDb
connect.documentdb.db
toconnect.documentdb.db
- Rename Azure DocumentDb
connect.documentdb.database.create
toconnect.documentdb.db.create
- Rename Cassandra Source
connect.cassandra.source.kcql
toconnect.cassandra.kcql
- Rename Cassandra Source
connect.cassandra.source.timestamp.type
toconnect.cassandra.timestamp.type
- Rename Cassandra Source
connect.cassandra.source.import.poll.interval
toconnect.cassandra.import.poll.interval
- Rename Cassandra Source
connect.cassandra.source.error.policy
toconnect.cassandra.error.policy
- Rename Cassandra Source
connect.cassandra.source.max.retries
toconnect.cassandra.max.retries
- Rename Cassandra Sink
connect.cassandra.source.retry.interval
toconnect.cassandra.retry.interval
- Rename Cassandra Sink
connect.cassandra.sink.kcql
toconnect.cassandra.kcql
- Rename Cassandra Sink
connect.cassandra.sink.error.policy
toconnect.cassandra.error.policy
- Rename Cassandra Sink
connect.cassandra.sink.max.retries
toconnect.cassandra.max.retries
- Rename Cassandra Sink Sink
connect.cassandra.sink.retry.interval
toconnect.cassandra.retry.interval
- Rename Coap Source
connect.coap.bind.port
toconnect.coap.port
- Rename Coap Sink
connect.coap.bind.port
toconnect.coap.port
- Rename Coap Source
connect.coap.bind.host
toconnect.coap.host
- Rename Coap Sink
connect.coap.bind.host
toconnect.coap.host
- Rename MongoDb
connect.mongo.database
toconnect.mongo.db
- Rename MongoDb
connect.mongo.sink.batch.size
toconnect.mongo.batch.size
- Rename Druid
connect.druid.sink.kcql
toconnect.druid.kcql
- Rename Druid
connect.druid.sink.conf.file
toconnect.druid.kcql
- Rename Druid
connect.druid.sink.write.timeout
toconnect.druid.write.timeout
- Rename Elastic
connect.elastic.sink.kcql
toconnect.elastic.kcql
- Rename HBase
connect.hbase.sink.column.family
toconnect.hbase.column.family
- Rename HBase
connect.hbase.sink.kcql
toconnect.hbase.kcql
- Rename HBase
connect.hbase.sink.error.policy
toconnect.hbase.error.policy
- Rename HBase
connect.hbase.sink.max.retries
toconnect.hbase.max.retries
- Rename HBase
connect.hbase.sink.retry.interval
toconnect.hbase.retry.interval
- Rename Influx
connect.influx.sink.kcql
toconnect.influx.kcql
- Rename Influx
connect.influx.connection.user
toconnect.influx.username
- Rename Influx
connect.influx.connection.password
toconnect.influx.password
- Rename Influx
connect.influx.connection.database
toconnect.influx.db
- Rename Influx
connect.influx.connection.url
toconnect.influx.url
- Rename Kudu
connect.kudu.sink.kcql
toconnect.kudu.kcql
- Rename Kudu
connect.kudu.sink.error.policy
toconnect.kudu.error.policy
- Rename Kudu
connect.kudu.sink.retry.interval
toconnect.kudu.retry.interval
- Rename Kudu
connect.kudu.sink.max.retries
toconnect.kudu.max.reties
- Rename Kudu
connect.kudu.sink.schema.registry.url
toconnect.kudu.schema.registry.url
- Rename Redis
connect.redis.connection.password
toconnect.redis.password
- Rename Redis
connect.redis.sink.kcql
toconnect.redis.kcql
- Rename Redis
connect.redis.connection.host
toconnect.redis.host
- Rename Redis
connect.redis.connection.port
toconnect.redis.port
- Rename ReThink
connect.rethink.source.host
toconnect.rethink.host
- Rename ReThink
connect.rethink.source.port
toconnect.rethink.port
- Rename ReThink
connect.rethink.source.db
toconnect.rethink.db
- Rename ReThink
connect.rethink.source.kcql
toconnect.rethink.kcql
- Rename ReThink Sink
connect.rethink.sink.host
toconnect.rethink.host
- Rename ReThink Sink
connect.rethink.sink.port
toconnect.rethink.port
- Rename ReThink Sink
connect.rethink.sink.db
toconnect.rethink.db
- Rename ReThink Sink
connect.rethink.sink.kcql
toconnect.rethink.kcql
- Rename JMS
connect.jms.user
toconnect.jms.username
- Rename JMS
connect.jms.source.converters
toconnect.jms.converters
- Remove JMS
connect.jms.converters
and replace my kcqlwithConverters
- Remove JMS
connect.jms.queues
and replace my kcqlwithType QUEUE
- Remove JMS
connect.jms.topics
and replace my kcqlwithType TOPIC
- Rename Mqtt
connect.mqtt.source.kcql
toconnect.mqtt.kcql
- Rename Mqtt
connect.mqtt.user
toconnect.mqtt.username
- Rename Mqtt
connect.mqtt.hosts
toconnect.mqtt.connection.hosts
- Remove Mqtt
connect.mqtt.converters
and replace my kcqlwithConverters
- Remove Mqtt
connect.mqtt.queues
and replace my kcqlwithType=QUEUE
- Remove Mqtt
connect.mqtt.topics
and replace my kcqlwithType=TOPIC
- Rename Hazelcast
connect.hazelcast.sink.kcql
toconnect.hazelcast.kcql
- Rename Hazelcast
connect.hazelcast.sink.group.name
toconnect.hazelcast.group.name
- Rename Hazelcast
connect.hazelcast.sink.group.password
toconnect.hazelcast.group.password
- Rename Hazelcast
connect.hazelcast.sink.cluster.members
tpconnect.hazelcast.cluster.members
- Rename Hazelcast
connect.hazelcast.sink.batch.size
toconnect.hazelcast.batch.size
- Rename Hazelcast
connect.hazelcast.sink.error.policy
toconnect.hazelcast.error.policy
- Rename Hazelcast
connect.hazelcast.sink.max.retries
toconnect.hazelcast.max.retries
- Rename Hazelcast
connect.hazelcast.sink.retry.interval
toconnect.hazelcast.retry.interval
- Rename VoltDB
connect.volt.sink.kcql
toconnect.volt.kcql
- Rename VoltDB
connect.volt.sink.connection.servers
toconnect.volt.servers
- Rename VoltDB
connect.volt.sink.connection.user
toconnect.volt.username
- Rename VoltDB
connect.volt.sink.connection.password
toconnect.volt.password
- Rename VoltDB
connect.volt.sink.error.policy
toconnect.volt.error.policy
- Rename VoltDB
connect.volt.sink.max.retries
toconnect.volt.max.retries
- Rename VoltDB
connect.volt.sink.retry.interval
toconnect.volt.retry.interval
0.2.5 (8 Apr 2017)
- Added Azure DocumentDB Sink Connector
- Added JMS Source Connector.
- Added UPSERT to Elastic Search
- Support Confluent 3.2 and Kafka 0.10.2.
- Cassandra improvements
withunwrap
- Upgrade to Kudu 1.0 and CLI 1.0
- Add ingest_time to CoAP Source
- InfluxDB bug fixes for tags and field selection.
- Added Schemaless Json and Json with schema support to JMS Sink.
- Support for Cassandra data type of
timestamp
in the Cassandra Source for timestamp tracking.
0.2.4 (26 Jan 2017)
- Added FTP and HTTP Source.
- Added InfluxDB tag support. KCQL: INSERT INTO targetdimension
SELECT * FROM influx-topic WITHTIMESTAMP sys_time() WITHTAG(field1, CONSTANT_KEY1=CONSTANT_VALUE1, field2,CONSTANT_KEY2=CONSTANT_VALUE1)
- Added InfluxDb consistency level. Default is
ALL
. Useconnect.influx.consistency.level
to set it to ONE/QUORUM/ALL/ANY - InfluxDb
connect.influx.sink.route.query
was renamed toconnect.influx.sink.kcql
- Added support for multiple contact points in Cassandra
0.2.3 (5 Jan 2017)
- Added CoAP Source and Sink.
- Added MongoDB Sink.
- Added MQTT Source.
- Hazelcast support for ring buffers.
- Redis support for Sorted Sets.
- Added start scripts.
- Added Kafka Connect and Schema Registry CLI.
- Kafka Connect CLI now supports pause/restart/resume; checking connectors on the classpath and validating configuration of connectors.
- Support for
Struct
,Schema.STRING
andJson
with schema in the Cassandra, ReThinkDB, InfluxDB and MongoDB sinks. - Rename
export.query.route
tosink.kcql
. - Rename
import.query.route
tosource.kcql
. - Upgrade to KCQL 0.9.5 - Add support for
STOREAS
so specify target sink types, e.g. Redis Sorted Sets, Hazelcast map, queues, ringbuffers.
Building
Requires gradle 6.0 to build.
To build
gradle compile
To test
gradle test
To create a fat jar
gradle shadowJar
You can also use the gradle wrapper
./gradlew shadowJar
To view dependency trees
gradle dependencies # or
gradle :kafka-connect-cassandra:dependencies
To build a particular project
gradle :kafka-connect-elastic5:build
To create a jar of a particular project:
gradle :kafka-connect-elastic5:shadowJar
Contributing
We'd love to accept your contributions! Please use GitHub pull requests: fork the repo, develop and test your code, semantically commit and submit a pull request. Thanks!