com.instaclustr:sstable-generator-cassandra-4

Tool for creation of Cassandra SSTables programmatically

License	License The Apache License, Version 2.0
Categories	Categories Cassandra Data Databases
GroupId	GroupId com.instaclustr
ArtifactId	ArtifactId sstable-generator-cassandra-4
Last Version	Last Version 1.3
Release Date	Release Date Nov 11, 2020
Type	Type jar
Description	Description Tool for creation of Cassandra SSTables programmatically
Project Organization	Project Organization Instaclustr

Download sstable-generator-cassandra-4

Filename	Size
sstable-generator-cassandra-4-1.3.pom
sstable-generator-cassandra-4-1.3.jar	7 MB
sstable-generator-cassandra-4-1.3-sources.jar	6 KB
sstable-generator-cassandra-4-1.3-javadoc.jar	45 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.instaclustr/sstable-generator-cassandra-4/ -->
<dependency>
    <groupId>com.instaclustr</groupId>
    <artifactId>sstable-generator-cassandra-4</artifactId>
    <version>1.3</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.instaclustr/sstable-generator-cassandra-4/
implementation 'com.instaclustr:sstable-generator-cassandra-4:1.3'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.instaclustr/sstable-generator-cassandra-4/
implementation ("com.instaclustr:sstable-generator-cassandra-4:1.3")

Apache Buildr

'com.instaclustr:sstable-generator-cassandra-4:jar:1.3'

Apache Ivy

<dependency org="com.instaclustr" name="sstable-generator-cassandra-4" rev="1.3">
  <artifact name="sstable-generator-cassandra-4" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.instaclustr', module='sstable-generator-cassandra-4', version='1.3')
)

Scala SBT

libraryDependencies += "com.instaclustr" % "sstable-generator-cassandra-4" % "1.3"

Leiningen

[com.instaclustr/sstable-generator-cassandra-4 "1.3"]

Dependencies

provided (1)

Group / Artifact	Type	Version
org.apache.cassandra : cassandra-all	jar	4.0-beta3

test (4)

Group / Artifact	Type	Version
com.instaclustr : sstable-generator	jar	1.3
com.github.nosan : embedded-cassandra	jar	3.0.2
junit : junit	jar	4.13.1
org.awaitility : awaitility	jar	3.1.2

Project Modules

There are no modules declared in this project.

Cassandra-SStable-Generator

CLI tool for programmatic generation of Cassandra SSTables

Website: https://www.instaclustr.com/
Documentation: https://www.instaclustr.com/support/documentation/

This tool simply generates SSTables programmatically. It uses Cassandra’s CQLSSTableWriter. After the generation of SSTables is finished, you can load them by sstableloader tool as usual.

The project consists of these modules:

api—impl is coded against this module
impl—the implementation of your population logic, depends on api
generator—the implementation of whole generator CLI application
cassandra-3—the implementation of SSTable generator using internals of Cassandra 3 artifact
cassandra- 4—the implementation of SSTable generator using internals of Cassandra 4 artifact

Build

mvn clean install

Run

Let’s guide you through an example. We want to generate a SSTable by Cassandra 3 API so we can load it to Cassandra afterwards. The components you need to have on a class path are as follows:

generator jar
cassandra-3 module jar
jar with the implementation of your generation logic

java \
  -cp /path/to/impl-1.0.jar:/path/to/generator-1.0.jar:/path/to/cassandra-3.jar \
  com.instaclustr.sstable.generator.LoaderApplication \
  _command_ \
  _arguments_

The concrete example of the invocation would be:

java \
    -cp impl/target/sstable-generator-impl-1.0:generator/target/sstable-generator-1.0.jar:cassandra-3/target/cassandra-3-1.0.jar \
    com.instaclustr.sstable.generator.LoaderApplication \
    fixed \
    --keyspace mykeyspace \
    --table mytable \
    --output-dir=/tmp/output \
    --schema cassandra-3/src/test/resources/cassandra/cql/table.cql \
    --threads 2

Please be aware that you need to have all libraries of Apache Cassandra on the classpath as well. For that reason, please use ./run.sh script and modify it to suit your needs in order to generate SStables.

No command executes default command—help:

Usage: <main class> [-V] COMMAND
  -V, --version   print version information and exit
Commands:
  csv     tool for bulk-loading of data from csv
  random  tool for bulk-loading of random data
  fixed   tool for bulk-loading of fixed data

`random` Command

By this command, you are expected to provide data which represents a row in random fashion.

`csv` Command

csv command has same arguments as random but --file is mandatory. There is supposed to be CSV file which represents rows. Each row will be parsed into a list of strings passed to RowMapper implementation where you have to map them to list of objects for a Cassandra INSERT statement as values.

`fixed` Command

By fixed command, we will generate a SSTable by using the exact list of "rows" with columns. This will be obvious from the documentation which follows.

Row Generation

In order to generate data for all three cases above you have to implement interface com.instaclustr.cassandra.bulkloader.RowMapper in api module. This implementation should be placed in impl (or its equivalent) and it should be on a class path.

RowMapper Interface

public interface RowMapper {

    /**
     * Maps list of strings from whatever input representing
     * a row to list of objects to insert into Cassandra.
     *<p>
     * This method e.g. called upon generation from CSV file.
     *<p>
     * @param row where values are consisting of list of strings
     * @return list of objects to put to insert statement
     */
    List<Object> map(final List<String> row);

    /**
     * Used when we do not want to generate data randomly but we have exact list of what to insert.
     *
     * @return list of rows to be created containing list of cells
     */
    Stream<List<Object>> get();

    /**
     * Logically same as {@link #map(List)} but all data per row
     * needs to be generated inside of the method. The number
     * of items in the returned list has to match number of columns
     * in a row. Each such object represents value which will be
     * passed to Cassandra INSERT statement.
     * <p>
     * This method is called repeatedly. Number of calls
     * is equal to paramter `--numberOfRecords`.
     *<p>
     * This method is called upon "random" generation.
     * @return list of objects to put to insert statement
     */
    List<Object> random();

    /**
     * @return string representation of INSERT INTO statement. Question marks in VALUES are not
     * meant to be replaced.
     * <p>
     * For example: 'INSERT INTO keyspace.table (field1, field2, field3) VALUES (?, ?, ?)'
     */
    String insertStatement();
}

The implementation of RowMapper you are supposed to place on the class path would look like this:

public class RowMapper1 implements RowMapper {


    public static final String KEYSPACE = "mykeyspace";
    public static final String TABLE = "mytable";

    public static final UUID UUID_1 = UUID.randomUUID();
    public static final UUID UUID_2 = UUID.randomUUID();
    public static final UUID UUID_3 = UUID.randomUUID();

    @Override
    public List<Object> map(final List<String> row) {
        return null;
    }

    @Override
    public Stream<List<Object>> get() {
        return Stream.of(
            new ArrayList<Object>() {{
                add(UUID_1);
                add("John");
                add("Doe");
            }},
            new ArrayList<Object>() {{
                add(UUID_2);
                add("Marry");
                add("Poppins");
            }},
            new ArrayList<Object>() {{
                add(UUID_3);
                add("Jim");
                add("Jack");
            }});
    }

    @Override
    public List<Object> random() {
        return null;
    }

    @Override
    public String insertStatement() {
        return format("INSERT INTO %s.%s (id, name, surname) VALUES (?, ?, ?);", KEYSPACE, TABLE);
    }
}

SPI Mechanism

There is a Java SPI mechanism for implementation discovery, so it means that besides implementing API you have to change the impl/src/main/resources/META-INF/services/com.instaclustr.sstable.generator.RowMapper file containing FQCN of your implemenation on one line.

Once the impl jar is placed on the class path, it will be automatically discovered by the generator module so you do not need to use any command-line arguments. Merely putting that JAR on the class path does the job.

The same mechanism works for cassandra-3/4 jar. In case you want to generate jars by CQLSSTableWriter for Cassandra 3, just put that jar on the class path. If you want to generate "Cassandra 4 SSTables", place the respective cassandra-4.jar on the class path as shown above.

In practice this means that you need to compile only an impl module which contains one class so the compilation and JAR building will take literally a few seconds (less than 1 sec here). The command line arguments for all will look the same.

Further Information

See blog by Anup Shirolkar ["A Comprehensive Guide to Cassandra Architecture"](https://www.instaclustr.com/cassandra-architecture/)
See blog by Anup Shirolkar ["Apache Cassandra Compaction Strategies "](https://www.instaclustr.com/apache-cassandra-compaction/)
Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project

Instaclustr

Versions

Version
1.3 Nov 11, 2020
1.2 Oct 5, 2020
1.1 Oct 1, 2020

com.instaclustr:sstable-generator-cassandra-4

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project Organization