Correlated Iterators Processor

Sequentially process correlated data from sorted iterators.

License

License

GroupId

GroupId

com.teketik
ArtifactId

ArtifactId

cip
Last Version

Last Version

1.0
Release Date

Release Date

Type

Type

jar
Description

Description

Correlated Iterators Processor
Sequentially process correlated data from sorted iterators.
Project URL

Project URL

https://github.com/antoinemeyer/correlated-iterators-processor
Source Code Management

Source Code Management

https://github.com/antoinemeyer/correlated-iterators-processor

Download cip

How to add to project

<!-- https://jarcasting.com/artifacts/com.teketik/cip/ -->
<dependency>
    <groupId>com.teketik</groupId>
    <artifactId>cip</artifactId>
    <version>1.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.teketik/cip/
implementation 'com.teketik:cip:1.0'
// https://jarcasting.com/artifacts/com.teketik/cip/
implementation ("com.teketik:cip:1.0")
'com.teketik:cip:jar:1.0'
<dependency org="com.teketik" name="cip" rev="1.0">
  <artifact name="cip" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.teketik', module='cip', version='1.0')
)
libraryDependencies += "com.teketik" % "cip" % "1.0"
[com.teketik/cip "1.0"]

Dependencies

test (1)

Group / Artifact Type Version
org.junit.jupiter : junit-jupiter-api jar 5.7.1

Project Modules

There are no modules declared in this project.

Correlated Iterators Processor

The goal of this module is to offer a convenient and efficient way to iterate over correlated data contained within multiple sorted iterators.

Each sequential iteration of the resulting sorted stream allows the processing of the correlated data as a single unit of work.

Context

It is frequent for banking institutions to make available flat CSV files containing account information such as positions, transactions and/or other account information. Those files are usually sorted by account number and can become too large to mount entirely in memory.

Using Correlated Iterators Processor, it is possible to open a streamed iterator on those different files and process all the data related to an account as a chunk.

Example

Consider the two following data sets:

  • Data Set 1
Key Value
B value11
C value12
C value13
D value14
D value15
E value16
  • Data Set 2
Key Value
A value21
A value22
C value23
C value24
C value25
D value26
G value27

Opening an iterator on those two streams and running them through CIP using Key as the CorrelationKey would allow the following processing:

Key Data Set 1 Values Data Set 2 Values
A value21, value22
B value11
C value12, value13 value23, value24, value25
D value14, value15 value26
E value16
G value27

The corresponding java code would be:

CorrelatedIterables.correlate(
    dataSet1.iterator(), EntryA.class,
    dataSet2.iterator(), EntryB.class,
    new CorrelationDoubleStreamConsumer<String, EntryA, EntryB>() {
        @Override
        public void consume(String key, List<EntryA> aElements, List<EntryB> bElements) {
            //process the chunk
        }
    }
);

Usage

Maven dependency:

<dependency>
  <groupId>com.teketik</groupId>
  <artifactId>cip</artifactId>
  <version>1.0</version>
</dependency>

Main classes:

CorrelatedIterables contains a collection of convenient iterators to process multiple correlated iterators. If this does not contain what you need, have a look at CorrelatedIterable.

The java classes iterated should contain a field annotated with @CorrelationKey that will be used to find the correlations within all the iterators.

Versions

Version
1.0