com.cldellow:manu-common

Utilities to manage timeseries data.

License	License Eclipse Public License 1.0
GroupId	GroupId com.cldellow
ArtifactId	ArtifactId manu-common
Last Version	Last Version 0.2.2
Release Date	Release Date Jan 29, 2018
Type	Type jar
Description	Description com.cldellow:manu-common Utilities to manage timeseries data.
Project URL	Project URL https://github.com/cldellow/manu

Download manu-common

Filename	Size
manu-common-0.2.2.pom
manu-common-0.2.2.jar	5 KB
manu-common-0.2.2-sources.jar	3 KB
manu-common-0.2.2-javadoc.jar	37 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.cldellow/manu-common/ -->
<dependency>
    <groupId>com.cldellow</groupId>
    <artifactId>manu-common</artifactId>
    <version>0.2.2</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.cldellow/manu-common/
implementation 'com.cldellow:manu-common:0.2.2'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.cldellow/manu-common/
implementation ("com.cldellow:manu-common:0.2.2")

Apache Buildr

'com.cldellow:manu-common:jar:0.2.2'

Apache Ivy

<dependency org="com.cldellow" name="manu-common" rev="0.2.2">
  <artifact name="manu-common" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.cldellow', module='manu-common', version='0.2.2')
)

Scala SBT

libraryDependencies += "com.cldellow" % "manu-common" % "0.2.2"

Leiningen

[com.cldellow/manu-common "0.2.2"]

Dependencies

test (3)

Group / Artifact	Type	Version
junit : junit	jar	4.12
com.pholser : junit-quickcheck-core	jar	0.7
com.pholser : junit-quickcheck-generators	jar	0.7

Project Modules

There are no modules declared in this project.

Manu: "Mostly archived, not updated"

A time series storage format for integers and floats, using efficient delta encodings from FastPFOR.

Examples: pageviews by article in Wikipedia, stock open/close/high/low prices, weather temperatures.

Components

manu-format, a library for maintaining the data on disk
manu-cli, a command-line tool for ingesting data into the format
manu-serve, a web server to expose the data over REST

Design criteria

Priorities

Cheap
- I'm doing this to drive a hobby project; my dream would be to host a variety of datasets for $10/month.
- A Fermi estimate suggests Wikipedia pageviews has 100B datapoints over the last 10 years. This implies that storage costs will dominate.
Doesn’t need to be always-on
- This sort of follows from cheap -- the ability to load subsets of data, or to run on spot instances will be a useful tool to cut costs.

Non-priorities

Concurrent / fast writes
- These can happen offline.
Fast reads
- The pareto principle will likely apply to queries - 1% of keys will get 99% of reads. We can use Varnish or similar to cache at the application level.

Assumptions

Dense datasets
- Keys: if we see a key once, we expect to see it again.
- Values: if key X has a datapoint at T1, we expect most other keys will as well.
Correlated values
- Value for key X at T1 is likely related to value at T2.
Some datasets can be lossy
- Wikipedia pageviews, e.g., are likely insensitive to precision so long as the trend is generally correct.

Obligatory

Credit: Our Greatest Asset, Saturday Morning Breakfast Cereal

Versions

Version
0.2.2 Jan 29, 2018
0.2.1 Jan 27, 2018
0.2.0 Jan 18, 2018

com.cldellow:manu-common

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL