ZetaSketch

A collection of libraries for single-pass, distributed, sublinear-space approximate aggregation and sketching algorithms.

License

License

GroupId

GroupId

com.google.zetasketch
ArtifactId

ArtifactId

zetasketch
Last Version

Last Version

0.1.0
Release Date

Release Date

Type

Type

jar
Description

Description

ZetaSketch
A collection of libraries for single-pass, distributed, sublinear-space approximate aggregation and sketching algorithms.
Project URL

Project URL

https://github.com/google/zetasketch
Source Code Management

Source Code Management

https://github.com/google/zetasketch/tree/master

Download zetasketch

How to add to project

<!-- https://jarcasting.com/artifacts/com.google.zetasketch/zetasketch/ -->
<dependency>
    <groupId>com.google.zetasketch</groupId>
    <artifactId>zetasketch</artifactId>
    <version>0.1.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.google.zetasketch/zetasketch/
implementation 'com.google.zetasketch:zetasketch:0.1.0'
// https://jarcasting.com/artifacts/com.google.zetasketch/zetasketch/
implementation ("com.google.zetasketch:zetasketch:0.1.0")
'com.google.zetasketch:zetasketch:jar:0.1.0'
<dependency org="com.google.zetasketch" name="zetasketch" rev="0.1.0">
  <artifact name="zetasketch" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.google.zetasketch', module='zetasketch', version='0.1.0')
)
libraryDependencies += "com.google.zetasketch" % "zetasketch" % "0.1.0"
[com.google.zetasketch/zetasketch "0.1.0"]

Dependencies

runtime (5)

Group / Artifact Type Version
com.google.auto.value : auto-value-annotations jar 1.6.3
com.google.code.findbugs : jsr305 jar 3.0.2
com.google.errorprone : error_prone_annotations jar 2.3.2
it.unimi.dsi : fastutil jar 8.2.2
org.checkerframework : checker-qual jar 2.8.1

Project Modules

There are no modules declared in this project.

ZetaSketch

ZetaSketch is a collection of libraries for single-pass, distributed, approximate aggregation and sketching algorithms.

These algorithms estimate statistics that are often too expensive to compute exactly.

The estimates use far fewer memory resources than exact calculations. For example, the HyperLogLog++ algorithm can estimate daily active users with:

ZetaSketch currently includes libraries to implement the following algorithms:

Algorithm Statistics Libraries
HyperLogLog++ Estimates the number of distinct values Java

What is a sketch?

ZetaSketch libraries calculate statistics from sketches. A sketch is a summary of a large data stream. You can extract statistics from a sketch to estimate particular statistics of the original data, or merge sketches to summarize multiple data streams.

After choosing an algorithm, you can use its corresponding libraries to:

  • Create sketches
  • Add new data to existing sketches
  • Merge multiple sketches
  • Extract statistics from sketches

HyperLogLog++

The HyperLogLog++ (HLL++) algorithm estimates the number of distinct values in a data stream. HLL++ is based on HyperLogLog; HLL++ more accurately estimates the number of distinct values in very large and small data streams.

Creating a sketch

// Create a sketch for estimating the number of unique strings in a data stream.
// You can also create sketches for estimating the number of unique byte
// sequences, integers, and longs.

HyperLogLogPlusPlus<String> hll = new HyperLogLogPlusPlus.Builder().buildForStrings();

// You can also set a custom precision. The default normal and sparse precisions
// are 15 and 20, respectively.
HyperLogLogPlusPlus<String> hllCustomPrecision = new HyperLogLogPlusPlus.Builder()
    .normalPrecision(13).sparsePrecision(19).buildForStrings();

Adding new data to a sketch

// Add three strings to the `hll` sketch. You must first initialize an empty
// sketch and then add data to it.
hll.add("apple");
hll.add("orange");
hll.add("banana");

Merging sketches

// Merge `hll2` and `hll3` with `hll`. The sketches must have the same
// original data type and precision.
hll.merge(hll2);
hll.merge(hll3);

Extracting cardinality estimates

// Return the estimate of the number of distinct values.
long result = hll.result();

How to use ZetaSketch

Please find the instructions for your build tool on the right side of https://search.maven.org/artifact/com.google.zetasketch/zetasketch

How to build ZetaSketch

ZetaSketch uses Gradle as its build system. To build the project, simply run:

./gradlew build

License

Apache License 2.0

Contributing

We are not currently accepting contributions to this project. Please feel free to file bugs and feature requests using GitHub's issue tracker.

Disclaimer

This is not an officially supported Google product.

com.google.zetasketch

Google

Google ❤️ Open Source

Versions

Version
0.1.0