bikey

Bikey: low memory footprint Map and Set implementation on objects with composited keys

License

License

The Apache Software License, Version 2.0
Categories

Categories

KeY Data Data Formats Formal Verification
GroupId

GroupId

com.jerolba
ArtifactId

ArtifactId

bikey
Last Version

Last Version

0.9.0
Release Date

Release Date

Type

Type

jar
Description

Description

bikey
Bikey: low memory footprint Map and Set implementation on objects with composited keys
Project URL

Project URL

https://github.com/jerolba/bikey
Source Code Management

Source Code Management

https://github.com/jerolba/bikey

Download bikey

How to add to project

<!-- https://jarcasting.com/artifacts/com.jerolba/bikey/ -->
<dependency>
    <groupId>com.jerolba</groupId>
    <artifactId>bikey</artifactId>
    <version>0.9.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.jerolba/bikey/
implementation 'com.jerolba:bikey:0.9.0'
// https://jarcasting.com/artifacts/com.jerolba/bikey/
implementation ("com.jerolba:bikey:0.9.0")
'com.jerolba:bikey:jar:0.9.0'
<dependency org="com.jerolba" name="bikey" rev="0.9.0">
  <artifact name="bikey" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.jerolba', module='bikey', version='0.9.0')
)
libraryDependencies += "com.jerolba" % "bikey" % "0.9.0"
[com.jerolba/bikey "0.9.0"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

There are no modules declared in this project.

Maven Central Build Status Download Codecov License Javadocs


Bikey

Bikey implements Map and Set data structures with two keys minimizing memory consumption.

Why Bikey collections?

Current collections libraries (Guava, Commons Collection, Eclipse Collections) have poor or not support to Maps and Sets with two keys.

Implementing it manually with a Map<R, Map<C, V>>, Map<Tuple<R, C>, V> or a Set<Tuple<R, C>> consumes a lot of memory, and choosing an incorrect hashCode function for Tuple (or equivalent) class can penalize memory and CPU consumption.

Bikey Map collection can reduce to a 15%-30% of consumed memory by a traditional double map (depending on the map fill rate) and Bikey Set collection can reduce to a 1% of consumed memory by a Set<Tuple>, with none or low penalization in access time.

Some Quick Examples

BikeyMap API is defined like the Map interface but everywhere a key is needed, you must provide both key values.

To simplify the example Strings has been used as keys, but any object that implements equals and hashCode can be used as row or column key. You can also use any kind of object as value. I've used Integer to simplify the following code:

BikeyMap<String, String, Integer> stock = new TableBikeyMap<>();
stock.put("shirt-ref-123", "store-76", 10);
stock.put("pants-ref-456", "store-12", 24);
...
stock.put("tie-ref-789", "store-23", 2);

Integer available = stock.get("shirt-ref-1234", "store-45");

//Total stock in store-123
stock.entrySet().stream()
     .filter(entry -> entry.getColumn().equals("store-123"))
     .mapToInt(entry -> entry.getValue())
     .sum();

//Total stock in pants-ref-457
stock.entrySet().stream()
     .filter(entry -> entry.getRow().equals("pants-ref-457"))
     .mapToInt(entry -> entry.getValue())
     .sum();

//All products included
Set<String> products = stock.rowKeySet();

//All stores included
Set<String> stores = stock.columnKeySet();

//Contains a product and store?
if (stock.containsKey("tie-ref-789", "store-23")) {
    ....
}

//Get all product/stores presents in the map
BikeySet<String, String> productStores = map.bikeySet();

//BikeySet<R, C> also implements Set<Bikey<R, C>>
Set<Bikey<String, String>> productStoresSet = map.bikeySet();

//Get products and stores with stock
BikeySet<String, String> withStock = stock.entrySet().stream()
    .filter(entry -> entry.getValue() > 0)
    .map(BikeyEntry::getKey)
    .collect(BikeyCollectors.toSet());

//Do something with each element in the map
stock.forEach((product, store, units) -> {
    System.out.println("Product " + product + " has " + units + " in store " + store);
});

BikeySet API is defined like the Set interface but everywhere an element is used, changes to two values:

BikeySet<String, String> avengerFilms = new TableBikeySet<>();
avengerFilms.add("Hulk", "The Avengers");
avengerFilms.add("Iron Man", "The Avengers");
avengerFilms.add("Thor", "Avengers: Age of Ultron");
avengerFilms.add("Thor", "Thor: Ragnarok");
avengerFilms.add("Captain America", "Avengers: Infinity War");
....

if (avengerFilms.contains("Iron Man", "Black Panther")) {
    ....
}

//Films in the Set
Set<String> filmsInSet = avengerFilms.columnKeySet();

//Avengers in the Set
Set<String> avengersInSet = avengerFilms.rowKeySet();

//Films with Iron Man
List<String> ironManFilms = avengerFilms.stream()
    .filter(entry -> entry.getRow().equals("Iron Man"))
    .map(Bikey::getColumn)
    .collect(toList());

//Call to a BiFunction for each element in the Set
bikeySet.forEach(this::doSomething);

public void doSomething(String avenger, String film) {
  ....
}

Implementations

BikeyMap<R, C ,V> has two implementations:

  • TableBikeyMap<R, C ,V>: optimized for memory consumption, and with performance similar to a double map or tuple map version.
  • MatrixBikeyMap<R, C V: optimizes performance, but with the disadvantage of consuming a little more memory with low fill rates.

depending on your business logic, you can use one or the other.

MatrixBikeyMap behaves like a matrix and grows quickly in memory consumption, but then it remains stable. It's recommended only if the fill rate is greater than 60% or access time to their elements is important. By default we recommend to use TableBikey implementation.

Dependency

Bikey is uploaded to Maven Central Repository and to use it, you need to add the following Maven dependency:

<dependency>
  <groupId>com.jerolba</groupId>
  <artifactId>bikey</artifactId>
  <version>0.9.0</version>
</dependency>

in Gradle:

implementation 'com.jerolba:bikey:0.9.0'

or download the single jar from Maven Central Repository.

Benchmarks

Execute your own benchmarks before deciding to use this library, but as a reference you can start with these numbers:

Memory

Compared to Map<R, Map<C, V>> and Map<Tuple<R, C>, V>, the memory consumed filling a map with 10.000 x 1.000 elements is:

Compared to HashMap<R, HashSet<C>> and HashSet<Tuple<R, C>> implementations, the memory consumed filling a Set with 10.000 x 1.000 elements is:

Performance

To create and fill randomly different maps in each implementation, the time spent is:

To find randomly different maps in each implementation, the time spent is:

To create and fill randomly a Set with 10.000 x 1.000 elements, the time spent is:

To check randomly the existence of each element in a Set with 10.000 x 1.000 elements, the time spent is:

Contribute

Feel free to dive in! Open an issue or submit PRs.

Any contributor and maintainer of this project follows the Contributor Covenant Code of Conduct.

License

Apache 2 © Jerónimo López

Versions

Version
0.9.0