indeed-mph-table

Minimal Perfect Hash Tables

License

License

GroupId

GroupId

com.indeed
ArtifactId

ArtifactId

mph-table
Last Version

Last Version

1.0.5
Release Date

Release Date

Type

Type

jar
Description

Description

indeed-mph-table
Minimal Perfect Hash Tables
Source Code Management

Source Code Management

https://github.com/indeedeng/mph-table

Download mph-table

How to add to project

<!-- https://jarcasting.com/artifacts/com.indeed/mph-table/ -->
<dependency>
    <groupId>com.indeed</groupId>
    <artifactId>mph-table</artifactId>
    <version>1.0.5</version>
</dependency>
// https://jarcasting.com/artifacts/com.indeed/mph-table/
implementation 'com.indeed:mph-table:1.0.5'
// https://jarcasting.com/artifacts/com.indeed/mph-table/
implementation ("com.indeed:mph-table:1.0.5")
'com.indeed:mph-table:jar:1.0.5'
<dependency org="com.indeed" name="mph-table" rev="1.0.5">
  <artifact name="mph-table" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.indeed', module='mph-table', version='1.0.5')
)
libraryDependencies += "com.indeed" % "mph-table" % "1.0.5"
[com.indeed/mph-table "1.0.5"]

Dependencies

compile (10)

Group / Artifact Type Version
log4j : log4j jar 1.2.14
org.slf4j : slf4j-api jar 1.7.5
org.slf4j : slf4j-log4j12 jar 1.7.5
com.google.guava : guava jar 16.0.1
it.unimi.dsi : fastutil jar 6.5.15
it.unimi.dsi : sux4j jar 4.0.0
com.indeed : util-core jar 1.0.24
com.indeed : util-io jar 1.0.24
com.indeed : util-mmap jar 1.0.24
com.indeed : util-serialization jar 1.0.24

test (3)

Group / Artifact Type Version
junit : junit jar 4.12
com.pholser : junit-quickcheck-core jar 0.7
com.pholser : junit-quickcheck-generators jar 0.7

Project Modules

There are no modules declared in this project.

Minimal Perfect Hash Tables

OSS Lifecycle

About

Minimal Perfect Hash Tables are an immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch processes and shipped to multiple servers.

Usage

Indeed MPH is available on Maven Central, just add the following dependency:

<dependency>
    <groupId>com.indeed</groupId>
    <artifactId>mph-table</artifactId>
    <version>1.0.4</version>
</dependency>

The primary interfaces are TableReader, to construct a reader to an existing table, TableWriter, to build a table, and TableConfig, to specify the configuration for the writer.

How to write a table:

final TableConfig<Long, Long> config = new TableConfig()
    .withKeySerializer(new SmartLongSerializer())
    .withValueSerializer(new SmartVLongSerializer());
final Set<Pair<Long, Long>> entries = new HashSet<>();
for (long i = 0; i < 20; ++i) {
    entries.add(new Pair(i, i * i));
}
TableWriter.write(new File("squares"), config, entries);

How to read a table:

try (final TableReader<Long, Long> reader = TableReader.open("squares")) {
  final Long value = reader.get(3L);          // get one
  for (final Pair<Long, Long> p : reader) {   // iterate over all
     ...
  }
}

Command Line

In addition to the Java API, TableReader and TableWriter provide convenience command-line interfaces to read and write tables, allowing you to quickly get started without writing any code:

# print all key-values in a table as TSV
$ java com.indeed.mph.TableReader --dump <table>

# print the value for a single key
$ java com.indeed.mph.TableReader --get <key> <table>

# create a table from a TSV file of words with counts
$ java com.indeed.mph.TableWriter --valueSerializer .SmartVLongSerializer <table to create> <counts.tsv>

# create a table from a TSV file mapping movie ids to lists of actor names (compressed by reference)
$ java com.indeed.mph.TableWriter --keySerializer .SmartVLongSerializer --valueSerializer '.SmartListSerializer(.SmartDictionarySerializer)' <table to create> <movies.tsv>

# same as above, not actually storing the movie ids but still allowing retrieval by them
$ java com.indeed.mph.TableWriter --keyStorage IMPLICIT --keySerializer .SmartVLongSerializer --valueSerializer '.SmartListSerializer(.SmartDictionarySerializer)' <table to create> <movies.tsv>

Code of Conduct

This project is governed by the Contributor Covenant v 1.4.1

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

com.indeed

Indeed Engineering

Versions

Version
1.0.5
1.0.4
1.0.3
1.0.2
1.0.1
1.0.0