cassandra-hibernate-orm-dialect
Hibernate ORM Cassandra Dialect and JDBC Driver
The JDBC Driver is designed specifically for the hibernate integration, yet still can be used with the following tools:
Contents
- Description
- Maven artifacts
- Supported Versions
- Reasons
- Cassandra model example
- Implementation Details
- Running Tests
Description
This is an open-source JDBC driver and Hibernate ORM Dialect for Cassandra.
maven
<dependency>
<groupId>com.antonyudin.cassandra</groupId>
<artifactId>ay-cassandra-jdbc-driver</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>com.antonyudin.cassandra</groupId>
<artifactId>ay-hibernate-cassandra-dialect</artifactId>
<version>1.1</version>
</dependency>
Supported versions
The goal is to support the latest versions of Java, Hibernate, Cassandra, and Datastax driver.
- Java 9, 10, 11
- Cassandra 3.11.x, 4.0
- Hibernate 5.3.x
- Datastax Driver 3.4.0
Reasons
There are several open source Cassandra JPA implementations out there, yet none of them support critical features that I often use in my JPA/JEE projects. For example:
- Hibernate OGM
- Does not support Criteria API (Not just for Cassandra. No Criteria API support at all)
- Kundera
- Cassandra scalar native queries return a
List<Map<String, Object>>
orList<RandomEntity>
instead ofList<Object[]>
- No JPQL support for
NEW className(..)
inSELECT
clause - Selecting only
@EmbeddedId
property with JPQL fails - No proper support for polymorphic queries
- Criteria API - support for
@EmbeddedId
- cannot usePath<Entity>
to access embedded key and lots of other issues of non-JPA compliance
- Cassandra scalar native queries return a
One of the benefits of using JPA is a potential to switch to a different database provider without changing much of the code. I wanted to have a complex (not just a numeric ID and a couple of String attributes) JPA model and a Criteria based queries for that model that can be deployed to Cassandra, PostgreSQL, and MariaDB.
Here are the requirements for a complex model:
- Inheritance with support for polymorphic queries
- OneToMany and ManyToOne relations that can be queried through Criteria API.
- Composite (Embeddable) primary keys.
- Support for an Entity that has a primary key that references another Entity.
- Support for
java.util.UUID
. - Support for an
@Embeddable
ID that aggregates another@Embeddable
instance.
The model also has to make sense for a sharded/distributed database.
The jee-application/model/src/main/java
folder contains an example of a model that works for Cassandra, PostgreSQL, and MariaDB.
Cassandra model example
Lets take a common example - users and their posts. One user can have multiple posts. We should be able to:
- search users by their ID.
- search posts by their ID.
- search posts by their ID and the date of the post.
- when we find a post in the previous search, get access to the user ID and name without an extra query
- find all post IDs, titles and creation date posted by a user and do it efficiently - without quering the whole cluster.
- get access to the full post entity after we did a previous search by using an entity property.
here are the tables in cassandra:
CREATE TABLE jee.users (
identity uuid PRIMARY KEY,
dateofbirth date,
name text
)
CREATE TABLE jee.posts (
identity uuid PRIMARY KEY,
author_identity uuid,
author_name text,
content text,
created timestamp,
title text
)
CREATE TABLE jee.user_posts (
user_identity uuid,
post_identity uuid,
post_created timestamp,
title text,
PRIMARY KEY (user_identity, post_created, post_identity)
) WITH CLUSTERING ORDER BY (post_created DESC, post_identity ASC)
To design java classes for this model, lets devide the information about a user into two levels - basic (identity, name) and full (identity, name, dateOfBirth). Lets do the same for the posts - basic (identity, created, title) and full (identity, create, title, content). The "full" version of the class inherits the "basic" version.
Here are the classes:
The actual User entity class inherits the Full version and adds a @OneToMany
association with the UserPost entity. The UserPost entity is a de-normalized version of the User-Post association. The UserPost entity has a reference to both User and Post and also contains the Basic information about the Post (created date and title, but no content of the post). The Post has a reference to the user and contains user's Basic information (name). The UserPost entity allows us to find all posts by a user and get the identity, created date, and title of the post in one efficient query.
Here are the classses:
Finding a user looks trivial:
final User user = entityManager.find(User.class, identity);
Getting all posts with their "Basic" information in one efficient query looks like:
for (UserPost post: user.getPosts()) {
logger.info("\tpost.id.postIdentity: [" + post.getId().getPostIdentity() + "]");
logger.info("\tpost.created: [" + post.getPostBasic().getCreated() + "]");
logger.info("\tpost.title: [" + post.getPostBasic().getTitle() + "]");
}
Getting the content for the most recent user post looks like that:
final String content = user.getPosts().get(0).getPost().getContent();
Fetching user's posts limited by a date range using the Criteria API looks like this:
final CriteriaQuery<UserPost> criteria = entityManager.getCriteriaBuilder().createQuery(
UserPost.class
);
final Root<UserPost> items = criteria.from(UserPost.class);
criteria.select(items);
criteria.where(
entityManager.getCriteriaBuilder().and(
entityManager.getCriteriaBuilder().equal(
items.get(UserPost_.id).get(UserPostId_.userIdentity),
userIdentity
),
entityManager.getCriteriaBuilder().greaterThanOrEqualTo(
items.get(UserPost_.id).get(UserPostId_.postCreated),
dateFrom
),
entityManager.getCriteriaBuilder().lessThanOrEqualTo(
items.get(UserPost_.id).get(UserPostId_.postCreated),
dateTo
)
)
);
final List<UserPost> foundPosts = entityManager.createQuery(
criteria
).getResultList();
Implementation Details
This repository contains two main parts - Cassandra JDBC driver and Cassandra Hibernate ORM dialect.
The Cassandra JDBC driver is limited to the functionality that is used by hibernate. In other words, the JDBC driver implements the only methods that are used by Hibernate. Surprisingly, Hibernate uses only a small part of the JDBC API. This JDBC driver uses DataStax driver to communicate with the Cassandra nodes.
Even though JPA was designed to be used primarily with relational databases, it still works well with Cassandra and Cassandra's CQL. Of course, when designing a JPA model, one should keep in mind the limitations of Cassandra.
Here are some tips:
- Use
@Embeddable
and@EmbeddedId
to model primary keys. The order of the fields that are part of the primary key are important for Cassandra. This implementation does not have a mechanism to control the order of the fields in the primary key, but that should not be a problem. After hibernate generated the schema, one could grab the table structure usingdescribe TABLE
CQL command and change the order of the fields in the primary key and re-create the table.