jaggr

Simple JSON Aggregator for Java

License	License MIT License
GroupId	GroupId com.caffinc
ArtifactId	ArtifactId jaggr
Last Version	Last Version 0.5.0
Release Date	Release Date Dec 1, 2016
Type	Type jar
Description	Description jaggr Simple JSON Aggregator for Java
Project URL	Project URL https://github.com/caffinc/jaggr
Source Code Management	Source Code Management https://github.com/caffinc/jaggr

Download jaggr

Filename	Size
jaggr-0.5.0.pom
jaggr-0.5.0.jar	15 KB
jaggr-0.5.0-sources.jar	10 KB
jaggr-0.5.0-javadoc.jar	100 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.caffinc/jaggr/ -->
<dependency>
    <groupId>com.caffinc</groupId>
    <artifactId>jaggr</artifactId>
    <version>0.5.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.caffinc/jaggr/
implementation 'com.caffinc:jaggr:0.5.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.caffinc/jaggr/
implementation ("com.caffinc:jaggr:0.5.0")

Apache Buildr

'com.caffinc:jaggr:jar:0.5.0'

Apache Ivy

<dependency org="com.caffinc" name="jaggr" rev="0.5.0">
  <artifact name="jaggr" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.caffinc', module='jaggr', version='0.5.0')
)

Scala SBT

libraryDependencies += "com.caffinc" % "jaggr" % "0.5.0"

Leiningen

[com.caffinc/jaggr "0.5.0"]

Dependencies

test (3)

Group / Artifact	Type	Version
com.caffinc : jaggr-utils	jar	0.5.0
org.apache.commons : commons-math3	jar	3.6.1
junit : junit	jar	4.12

Project Modules

There are no modules declared in this project.

jaggr

Simple JSON Aggregator for Java

Build Status

Usage

Adding dependency

jaggr is on Bintray and Maven Central (Soon):

<dependency>
    <groupId>com.caffinc</groupId>
    <artifactId>jaggr</artifactId>
    <version>0.5.0</version>
</dependency>

<dependency>
    <groupId>com.caffinc</groupId>
    <artifactId>jaggr-utils</artifactId>
    <version>0.5.0</version>
</dependency>

Aggregating documents

Assume the following JSON documents are stored in a file called raw.json:

{"_id": 1, "f": "a", "test": {"f": 3}}
{"_id": 2, "f": "a", "test": {"f": 2}}
{"_id": 3, "f": "a", "test": {"f": 1}}
{"_id": 4, "f": "a", "test": {"f": 5}}
{"_id": 5, "f": "a", "test": {"f": -1}}
{"_id": 6, "f": "b", "test": {"f": 1}}
{"_id": 7, "f": "b", "test": {"f": 1}}
{"_id": 8, "f": "b", "test": {"f": 1}}
{"_id": 9, "f": "b", "test": {"f": 1}}
{"_id": 10, "f": "b", "test": {"f": 1}}

Read it in using the JsonFileReader in the jaggr-utils module using:

List<Map<String, Object>> jsonList = JsonFileReader.readJsonFromFile("raw.json");

Now various aggregations can be defined using the AggregationBuilder:

Aggregation aggregation = new AggregationBuilder()
                .setGroupBy(field)
                .addOperation("avg", new AverageOperation(avgField))
                .addOperation("sum", new SumOperation(sumField))
                .addOperation("min", new MinOperation(minField))
                .addOperation("max", new MaxOperation(maxField))
                .addOperation("count", new CountOperation())
                .getAggregation();

Aggregation can now be performed using the aggregate() method:

List<Map<String, Object>> result = aggregation.aggregate(jsonList);

Aggregation also supports Iterators:

List<Map<String, Object>> result = aggregation.aggregate(jsonList.iterator());

Aggregation actually works with any Iterable<Map<String, Object>> too.

The result of the above aggregation would look as follows:

{"_id": "a", "avg": 2.0, "sum": 10, "min": -1, "max": 5, "count": 5}
{"_id": "b", "avg": 1.0, "sum": 5, "min": 1, "max": 1, "count": 5}

Aggregating other data sources

While aggregating files or Lists of JSON documents might be good for some use cases, not all data fits this paradigm.

There are three utilities in the jaggr-utils library which can be used to aggregate other sources of data.

Aggregating small JSON files in the file system or resources

The JsonFileReader class exposes the readJsonFromFile and readJsonFromResource methods which can be used to read in all the JSON objects from the file into memory for aggregation.

It is generally not a good idea to read in large files due to obvious reasons.

List<Map<String, Object>> jsonData = JsonFileReader.readJsonFromFile("afile.json");

List<Map<String, Object>> jsonData = JsonFileReader.readJsonFromResource("aFileInResources.json");

List<Map<String, Object>> result = aggregation.aggregate(iterator);

Aggregating large JSON files or readers

The JsonStringIterator class provides constructors to iterate through a JSON file or a Reader object pointing to an underlying JSON String source without loading all the data into memory.

Iterator<Map<String, Object>> iterator = new JsonStringIterator("afile.json");

Iterator<Map<String, Object>> iterator = new JsonStringIterator(new BufferedReader(new FileReader("afile.json")));

List<Map<String, Object>> result = aggregation.aggregate(iterator);

Aggregating arbitrary object Iterators

The JsonIterator abstract class provides a way to convert an Iterator from any type to JSON. This can be used to iterate through data coming from arbitrary databases. For example, MongoDB provides Iterable interfaces to the data. You could aggregate an entire collection as follows:

Iterator<Map<String, Object>> iterator = new JsonIterator<DBObject>(mongoCollection.find().iterator()) {
    @Override
    public Map<String, Object> toJson(DBObject element) {
        return element.toMap();
    }
};

List<Map<String, Object>> result = aggregation.aggregate(iterator);

Aggregating batches of data

Starting with version 0.4.0, jaggr supports aggregation of batches of data in a new class called BatchAggregation. The following example shows BatchAggregation in action:

Input Data:

{"_id": 1, "f": "a"}
{"_id": 2, "f": "a"}
{"_id": 3, "f": "a"}
{"_id": 4, "f": "a"}
{"_id": 5, "f": "a"}
{"_id": 6, "f": "b"}
{"_id": 7, "f": "b"}
{"_id": 8, "f": "b"}
{"_id": 9, "f": "b"}
{"_id": 10, "f": "b"}

Aggregation:

BatchAggregation aggregation = new AggregationBuilder()
            .setGroupBy("f")
            .addOperation("count", new CountOperation())
            .getBatchAggregation();

aggregation.aggregateBatch(jsonData);
List<Map<String, Object>> result = aggregation.getFinalResult();

Result:

[
	{"_id":"b","count":5},
	{"_id":"a","count":5}
]

The aggregateBatch() method can be called several times with more data. It can also be chained.

result = aggregation
			.aggregateBatch(batch1)
			.aggregateBatch(batch2)
			.getFinalResult();

However the getFinalResult() method must be called just once to get the final result of the aggregation, after which the BatchAggregation object is reset. It can then be used to aggregate fresh batches of data.

Supported Aggregations

jaggr provides the following aggregations:

Count
Sum
Minimum
Maximum
Average
Collect as List
Collect as Set
First Object
Last Object
Standard Deviation (Population)
Top N Objects

Tests

There are extensive tests for each of the aggregations which can be checked out in the https://github.com/caffinc/jaggr/blob/master/jaggr/jaggr/src/test file.

There are tests for the jaggr-utils module in https://github.com/caffinc/jaggr/blob/master/jaggr/jaggr-utils/src/test

Dependencies

These are not absolute, but are current (probably) as of 26th November, 2016. It should be trivial to upgrade or downgrade versions as required.

Both jaggr and jaggr-utils depend on junit for tests:

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
        <scope>test</scope>
    </dependency>
</dependencies>

jaggr does not have any other external dependencies, but has a test dependency on jaggr-utils.

jaggr-utils has the following dependencies:

<dependencies>
	<dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.6.2</version>
    </dependency>
</dependencies>

Help

If you face any issues trying to get this to work for you, shoot me an email: admin@caffinc.com.

Good luck!

Versions

Version
0.5.0 Dec 1, 2016
0.4.0 Nov 30, 2016
0.3.0 Nov 29, 2016
0.2.2 Nov 28, 2016
0.2 Nov 27, 2016
0.1 Nov 27, 2016

jaggr

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download jaggr

How to add to project

Dependencies

test (3)

Project Modules

jaggr

Build Status

Usage

Adding dependency

Aggregating documents

Aggregating other data sources

Aggregating small JSON files in the file system or resources

Aggregating large JSON files or readers

Aggregating arbitrary object Iterators

Aggregating batches of data

Supported Aggregations

Tests

Dependencies

Help

Versions