com.spotify.crunch:crunch-lib

Useful reusable high-level components for common use-cases in processing data with Apache Crunch

License

License

GroupId

GroupId

com.spotify.crunch
ArtifactId

ArtifactId

crunch-lib
Last Version

Last Version

0.0.5
Release Date

Release Date

Type

Type

jar
Description

Description

com.spotify.crunch:crunch-lib
Useful reusable high-level components for common use-cases in processing data with Apache Crunch
Project URL

Project URL

https://github.com/spotify/crunch-lib
Source Code Management

Source Code Management

https://github.com/spotify/crunch-lib

Download crunch-lib

How to add to project

<!-- https://jarcasting.com/artifacts/com.spotify.crunch/crunch-lib/ -->
<dependency>
    <groupId>com.spotify.crunch</groupId>
    <artifactId>crunch-lib</artifactId>
    <version>0.0.5</version>
</dependency>
// https://jarcasting.com/artifacts/com.spotify.crunch/crunch-lib/
implementation 'com.spotify.crunch:crunch-lib:0.0.5'
// https://jarcasting.com/artifacts/com.spotify.crunch/crunch-lib/
implementation ("com.spotify.crunch:crunch-lib:0.0.5")
'com.spotify.crunch:crunch-lib:jar:0.0.5'
<dependency org="com.spotify.crunch" name="crunch-lib" rev="0.0.5">
  <artifact name="crunch-lib" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.spotify.crunch', module='crunch-lib', version='0.0.5')
)
libraryDependencies += "com.spotify.crunch" % "crunch-lib" % "0.0.5"
[com.spotify.crunch/crunch-lib "0.0.5"]

Dependencies

compile (1)

Group / Artifact Type Version
org.apache.crunch : crunch-core jar 0.11.0-hadoop2

provided (2)

Group / Artifact Type Version
org.apache.hadoop : hadoop-common jar 2.2.0
org.apache.hadoop : hadoop-mapreduce-client-core jar 2.2.0

test (1)

Group / Artifact Type Version
junit : junit jar 4.11

Project Modules

There are no modules declared in this project.

crunch-lib

This repository contains useful reusable high-level components for common use-cases in processing data with Apache Crunch

If you want to try it, it's in the central Maven repo so you can use this snippet (or equivalent for gradle/sbt/...)

<dependency>
   <groupId>com.spotify.crunch</groupId>
   <artifactId>crunch-lib</artifactId>
   <version>0.0.5</version>
</dependency>

AvroCollections

  • extract pulls out individual fields from a PCollection of Avro records by their field names without the need for trivial MapFns
  • keyByAvroField keys a PCollection of Avro records by a specific field using it's name without the need for trivial MapFns

SPTables

  • swapKeyValue swaps the key and the value parts of a PTable
  • negateCounts negates the value part of a long-valued table to facilitate easy sort-descending

TopLists

  • topNYbyX Creates a top-list of elements in the provided PTable, categorised by the key of the input table and using the count of the value part of the input table.
  • globalTopList Create a list of unique items in the input collection with their count, sorted descending by their frequency.

Averages

  • meanValue Calculates the mean value for each key in the provided numerically-valued PTable.

Percentiles

  • distributed / inMemory Calculates a set of percentiles for each key in the provided numerically-valued PTable.

DoFns

  • detach wrap a DoFn operating as a reducer such that each value given by the Iterable is already detached (preventing object reuse problems)
com.spotify.crunch

Spotify

Versions

Version
0.0.5
0.0.2