JournalDB

Low level journal database

License

License

GroupId

GroupId

com.picoff
ArtifactId

ArtifactId

journaldb
Last Version

Last Version

1.0.3
Release Date

Release Date

Type

Type

jar
Description

Description

JournalDB
Low level journal database
Project URL

Project URL

https://github.com/picoff/journaldb
Project Organization

Project Organization

Picoff Ventures
Source Code Management

Source Code Management

http://github.com/picoff/journaldb/tree/master

Download journaldb

How to add to project

<!-- https://jarcasting.com/artifacts/com.picoff/journaldb/ -->
<dependency>
    <groupId>com.picoff</groupId>
    <artifactId>journaldb</artifactId>
    <version>1.0.3</version>
</dependency>
// https://jarcasting.com/artifacts/com.picoff/journaldb/
implementation 'com.picoff:journaldb:1.0.3'
// https://jarcasting.com/artifacts/com.picoff/journaldb/
implementation ("com.picoff:journaldb:1.0.3")
'com.picoff:journaldb:jar:1.0.3'
<dependency org="com.picoff" name="journaldb" rev="1.0.3">
  <artifact name="journaldb" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.picoff', module='journaldb', version='1.0.3')
)
libraryDependencies += "com.picoff" % "journaldb" % "1.0.3"
[com.picoff/journaldb "1.0.3"]

Dependencies

compile (2)

Group / Artifact Type Version
org.slf4j : slf4j-api jar 1.8.0-beta2
com.picoff : commons jar 1.17.3

test (3)

Group / Artifact Type Version
junit : junit jar 4.12
com.google.truth : truth jar 0.42
org.slf4j : slf4j-simple jar 1.8.0-beta2

Project Modules

There are no modules declared in this project.

JournalDB

Low level embedded journaling database for JVM 1.8+

About

JournalDB is a low level append-only journaling database, useful for storing and retrieving binary operations logs, store failure recovery records, oplogs, or any free form binary data really. It was designed to be the default storage backend for Geoste Server fail-to-write event logs, however it can be used to persist pretty much any kind of binary data.

Installation

JournalDB is available in Maven Central.

<dependency>
    <groupId>com.picoff</groupId>
    <artifactId>journaldb</artifactId>
    <version>VERSION</version>
</dependency>

Usage example

import com.picoff.commons.unit.DigitalUnit;
import com.picoff.journaldb.*;

import java.io.File;
import java.io.IOException;

class Main {
    public static void main(final String[] args) throws IOException {
        final JournalDBOptions options = new JournalDBOptions();

        options.setDataDirectory(new File("./database/"));
        options.setJournalMaxSize(5, DigitalUnit.GIGABYTE); // Automatically relocate if the current journal reaches this size (approximately)
        options.setRelocateOnBootFailure(false); // Relocate if the current active journal was not closed properly
        options.setRelocateOnWriteFailure(false);  // Relocate on failing to write a journal

        final JournalDB journalDB = new JournalDB(options);

        journalDB.write("some data".getBytes()); // Without sync to hardware
        journalDB.write("some sync data".getBytes(), true); // With sync to hardware

        // ........

        // Relocate and archive the current journal
        final long previousSequence = journalDB.relocate();

        final JournalReaderOptions readerOptions = new JournalReaderOptions();
        readerOptions.setFailOnMagicByte(true); // Fail if journal file does not begin with the magic byte
        readerOptions.setFailOnNotArchived(true); // Fail if attempting to read a non-archived journal
        readerOptions.setFailOnNotClosedGracefully(true); // Fail if journal was not closed gracefully

        final JournalReader reader = journalDB.createReader(readerOptions, previousSequence);

        final EntryReadOptions readOptions = new EntryReadOptions();
        readOptions.setVerifyChecksum(true); // Fail if data read from the disk does not match the stored checksum
        readOptions.setFailOnIntegrityByte(true); // Fail if record does not have integrity byte set to 1
        readOptions.setFailOnMagicByte(true); // Fail if entry does not start with the magic byte

        readOptions.setReadFilter(meta -> {
            // Filter record by its metadata, without reading the actual data.
            return true;
        });

        readOptions.setStartPosition(Journal.FILE_HEADER_SIZE); // Set the position where to start reading from, normally - from the end of the header.

        reader.forEachEntry(readOptions, entry -> {
            final byte[] data = entry.getData();

            try {
                entry.writeProcessedState(true, true); // Optionally, mark record as processed in sync mode.
            } catch (final IOException e) {
                e.printStackTrace();
            }
        });

        journalDB.flush();
        journalDB.close();
    }
}

Failure and recovery

JournalDB has some built in integrity checks (like double-write of the integrity bit for journal entries, etc), however as with any database, this is done on best effort basis. Random hardware failure could potentially corrupt some data, but given the simple binary format of the journal files and extensive options available for the built in reader, recovery in case of hard failure should be trivial.

Entries of special significance can also be written in "sync" mode, meaning the method invocation will only return when all the data has been written and flushed to the underlying hardware device. Sync mode has significant performance penalty.

Whenever journal fails to be written, it will automatically close itself and prevent any other writes to that journal file. It will also emit an IOException and, depending on the options, either auto-close the database preventing any further writes, or attempt to relocate to a new journal on best effort basis. If that is not possible, JournalDB will attempt to close itself, and all future write will fail.

In either case, JournalDB suffers from all the same issues that can be attributed to any single node, single point of failure setup. For mission critical data, we suggest JournalDB be only used as an additional layer of persistence, in addition to replicated and distributed systems, for example, Apache Kafka or Apache BookKeeper.

Journal states

Journals can be either "active" or "archived". Active journals are ones currently being written, archived journals are journals those where all write operations are complete and they are safe to read.

You should not read non-archived journals (ones that have not been relocated out of the current JournalDB instance). You can force reading of such file via options, but it can lead to unpredicted results.

Entry order

Entries are guaranteed to be written in the order of allocation, even in parallel. This is because each Journal contains internal sequence number, and sequence numbers are in fact allocated synchronously, one by one. Therefore entry order with one journal file is guaranteed to be absolute.

Reading entries and marking entries as processed

JournalDB allows to mark journal entries as "processed" whenever you are iterating and reading journal files. This can be conceptually used as a way to mark some entries recovered or can mean literally anything - usage of this flag is to be defined by the user.

Marking entry processed write a magic byte and timestamp to the journal file (even archived one!), and can be retrieved when iterating later.

This feature is useful, for example, to mark records "processed" or "recovered" in some scope.

Journal file format

JournalDB stores data in files called journals. Journal is a binary file and follows this format:

Header

Header contains journal file metadata and is 100 bytes long.

Offset Length Description
0 1 Magic byte, character 'j'
1 1 Archive marker, 0 or 1. Indicates whether or not this journal is "archived" and can not be written any more.
2 1 Graceful close marker, 0 or 1. Indicates whether this file has been closed gracefully.
3 8 UNIX timestamp on when this journal was created, milliseconds since epoch
11 8 UNIX timestamp on when this journal was archived, milliseconds since epoch
19 8 Next available entry sequence number
27 8 Last write position known, offset bytes from start of file
35 65 Reserved for future use

Entry region

Bytes after header region is so called entry-space. This region contains variable length entries and metadata written to the journal.

Each entry contains 48 bytes of metadata, variable length data region and 8 bytes long CRC checksum.

The maximum length of the data stored per record is limited to the maximum size of byte[] in JVM minus some 100 bytes, depending on the JVM implementation used.

Offset Length Description
0 1 Magic byte, 'r'
1 1 Record integrity marker, 0 or 1. Normally always 1, unless the record got corrupted on write somehow.
2 4 Record length in bytes (integer)
6 8 Record sequence number
14 8 UNIX timestamp on when this record was created, milliseconds since epoch
22 1 Entry processing flag (see bellow)
23 8 Entry processing timestamp (see bellow)
31 17 Reserved for future use
48 ??? Data
??? 8 CRC32 checksum of the data in the record

State of the library

JournalDB is considered to be ready for use in production, however, this is a relatively new library and should be treated as such. Some things might still need some ironing out.

License

Apache License, version 2.0

Versions

Version
1.0.3
1.0.2
1.0.1