śiva format शिव for the JVM
This library is a Java implementation of siva format. It is intended to be used with any JVM language. The main implementation is written in Go here.
This java library offers an API to read and unpack siva files but not to write them yet.
Usage
siva-java
is available on maven central. To include it as a dependency in your project managed by sbt add the dependency to your build.sbt
file:
libraryDependencies += "tech.sourced" % "siva-java" % "[version]"
On the other hand, if you use maven to manage your dependencies, you must add the dependency to your pom.xml
:
<dependency>
<groupId>tech.sourced</groupId>
<artifactId>siva-java</artifactId>
<version>[version]</version>
</dependency>
If you use gradle to manage your dependencies, add the following to your build.gradle
file in the dependencies
section:
compile 'tech.sourced:siva-java:[version]'
In all cases, replace [version]
with the latest siva-java version.
Example of Usage
package com.github.mcarmonaa.sivaexample;
import org.apache.commons.io.FileUtils;
import tech.sourced.siva.IndexEntry;
import tech.sourced.siva.SivaReader;
import java.io.File;
import java.io.InputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Main {
private static final String SIVA_DIR = "/tmp/siva-files/";
private static final String SIVA_UNPACKED_DIR = "/tmp/siva-unpacked/";
private static final String DEFAULT_SIVA_FILE = SIVA_DIR + "/aac052c42c501abf6aa8c3509424e837bb27e188.siva";
private static final Logger LOGGER = Logger.getLogger(Main.class.getName());
public static void main(String[] args) {
LOGGER.log(Level.INFO, "unpacking siva-file");
try (SivaReader sivaReader = new SivaReader(new File(DEFAULT_SIVA_FILE))) {
List<IndexEntry> index = sivaReader.getIndex().getFilteredIndex().getEntries();
for (IndexEntry indexEntry : index) {
InputStream entry = sivaReader.getEntry(indexEntry);
Path outPath = Paths.get(SIVA_UNPACKED_DIR.concat(indexEntry.getName()));
FileUtils.copyInputStreamToFile(entry, new File(outPath.toString()));
}
} catch (Exception ex) {
LOGGER.log(Level.SEVERE, ex.toString(), ex);
}
}
}
Development
Build
To build the project and generate a jar file:
make build
It leaves the jar file at ./target/siva-java-[version].jar
, being [version]
the version specified in the build.sbt
Tests
Just run:
make test
Clean
To clean the project:
make clean
Limitations
Some known limitations and implementation divergences regarding the main siva reference specification
All the issues commented below are related to the index
part of the blocks since that is where siva really places the metadata. Most of the meta-information is encoded as unsigned values, because of this, most of the problems come from the lack of unsigned values in the JVM
.
To avoid these limitations, in some cases, a cast to a bigger number type and a binary AND
operation with a mask solves the problem. The trick consists of:
unsigned int8 (byte in Go): 255
if you read this byte in Java, it interprets the value as signed. So the same bits in Java result on:
signed int8 (byte in Java): -1
Casting this value to a java integer, keeps the value as -1, so we apply a binary mask, with the less weight byte set to all "ones" and the rest of the byte to "zeros":
byte b = readByte() // 255 read, but in java the value is -1
int mask = 0x000000FF
int n = b & mask // now n is an integer storing the value 255
This procedure is related on how JVM
encodes the number values using two's complement and it can apply for all the types which can be cast to a bigger number type.
Unsigned Integer 64 Limitation!: a siva file with a value in those fields that the specification encodes as uint64
can contain values in range [0, 264-1] while java implementation only supports values in range [0, 264-1-1]. There's no a number type bigger than a long
(int64) in java, so this can't be avoided.
Next, are pointed those parts of the index
affected by different issues:
-
Index Signature: The reference specification says that a sequence of three bytes (
IBA
) is used as the signature but for the reference implementation in Go a byte is anuint8
while in java a byte is anint8
. The current java implementation doesn't take care about this since the three bytes used are all of them values less than 127, so these values are read properly. -
Index Entry:
- UNIX mode: is encoded as
uint32
, so in java implementation is cast to a long. - The offset of the file content, relative to the beginning of the block: this is an
uint64
value, so the implementation just read it as a long and check that is not negative. Unsigned Integer 64 Limitation! - Size of the file content: encoded as a
uint64
, check no negative. Unsigned Integer 64 Limitation! - CRC32:
uint32
value cast to along
java type. - Flags:
uint32
value, it's read without cast type since it only can contain values0 (No Flags)
or1 (Deleted)
.
- UNIX mode: is encoded as
-
Index Footer:
- Number of entries in the block:
uint32
value cast tolong
java type. - Index Size in bytes:
uint64
value can't be cast, check no negative. Unsigned Integer 64 Limitation! - Block size in bytes:
uint64
value cant't be cast, check no negative. Unsigned Integer 64 Limitation! - CRC32:
uint32
value cast to along
java type.
- Number of entries in the block:
Other comments: This java implementation verify the integrity of the index with the CRC
in the Index Footer. The integrity of the files should be checked optionally with the CRC
kept in the Index Entry by the clients of this library.
License
See LICENSE.