com.expedia.www:sample-key-extractor

Packages to send ("pipe") Haystack data to external sinks (like AWS Firehose or another Kafka queue)

License	License Apache License, Version 2.0
Categories	Categories KeY Data Data Formats Formal Verification
GroupId	GroupId com.expedia.www
ArtifactId	ArtifactId sample-key-extractor
Last Version	Last Version 2.0.0
Release Date	Release Date Oct 5, 2020
Type	Type jar
Description	Description Packages to send ("pipe") Haystack data to external sinks (like AWS Firehose or another Kafka queue)
Project URL	Project URL https://github.com/ExpediaDotCom/haystack-pipes/tree/master/sample-key-extractor

Download sample-key-extractor

Filename	Size
sample-key-extractor-2.0.0.pom
sample-key-extractor-2.0.0.jar	198 KB
sample-key-extractor-2.0.0-sources.jar	80 KB
sample-key-extractor-2.0.0-javadoc.jar	454 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.expedia.www/sample-key-extractor/ -->
<dependency>
    <groupId>com.expedia.www</groupId>
    <artifactId>sample-key-extractor</artifactId>
    <version>2.0.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.expedia.www/sample-key-extractor/
implementation 'com.expedia.www:sample-key-extractor:2.0.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.expedia.www/sample-key-extractor/
implementation ("com.expedia.www:sample-key-extractor:2.0.0")

Apache Buildr

'com.expedia.www:sample-key-extractor:jar:2.0.0'

Apache Ivy

<dependency org="com.expedia.www" name="sample-key-extractor" rev="2.0.0">
  <artifact name="sample-key-extractor" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.expedia.www', module='sample-key-extractor', version='2.0.0')
)

Scala SBT

libraryDependencies += "com.expedia.www" % "sample-key-extractor" % "2.0.0"

Leiningen

[com.expedia.www/sample-key-extractor "2.0.0"]

Dependencies

compile (3)

Group / Artifact	Type	Version
com.typesafe : config	jar	1.3.1
com.google.protobuf : protobuf-java-util	jar	3.3.1
com.expedia.www : span-key-extractor	jar	2.0.0

Project Modules

There are no modules declared in this project.

haystack-pipes

Packages to send ("pipe") Haystack data to external sinks (like AWS Firehose or another Kafka queue)

The haystack-pipes unit delivers a human-friendly version of Haystack messages to zero or more "durable" locations for more permanent storage. Current "plug`in" implementations are:

kafka-producer: this package uses Kafka Streams to read the protobuf records from Kafka, transform them to JSON, and write them to another Kafka, potentially and typically a different Kafka installation than the one from which the protobuf records were read. The kafka-producer package uses the Kafka Producer API to write to Kafka.
firehose-writer: this package uses Kafka Streams to read the protobuf records from Kafka, transform them to JSON, and write them to the Amazon Kinesis Data Firehose (an AWS service that facilitates loading streaming data into AWS). Note that its PutRecordBatch API accepts up to 500 records, with a maximum size of 4 MB for each put request; firehose-writer will batch the records appropriately. Kinesis Firehose can be configured to deliver the data to other AWS services that facilitate data analysis, like Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service.
json-transformer: this package is a uses Kafka Streams to read the protobuf records from Kafka, transform them to JSON, and write them to another topic in Kafka.
http-poster: this package uses Kafka Streams to read the protobuf records from Kafka, transform them to JSON, and send them to another service, via an HTTP POST request.
secret-detector: this package uses Kafka Streams to read the protobuf records from Kafka and search the tags of those protobuf records (the records are "Span" objects from the haystack-idl package) for "personal" data. This personal data is either PCI data (credit card numbers) or PII data (address, phone number, etc.). Which kind of personal data to search for is under configuration control. This secret-detector uses the open source chlorine-finder package for detection. When a secret is found, information identifying the secret (but not the secret itself), is written back to Kafka. To minimize the frequency of false positives (data thought to be secret that isn't really secret), a text file of whitelisted tags is stored in S3. The format of this text file is one or more lines of <finder name>;<service name>;<operation name>;<tag name>\n, that is, semi-colon delimited "four-ples" of fields from the Span, where a "four-ples" is separated from the next "four-ple" by a new line. Configurations controls where this text file is found in S3 (i.e. in what bucket and under what key).

In all of the cases above, "transform to JSON" implies "tag flattening": the OpenTracing API specifies tags in a somewhat unfriendly format. For example, the following open tracing tags:

"tags":[{"key":"strKey","vStr":"tagValue"},
        {"key":"longKey","vLong":"987654321"},
        {"key":"doubleKey","vDouble":9876.54321},
        {"key":"boolKey","vBool":true},
        {"key":"bytesKey","vBytes":"AAEC/f7/"}]

will be converted to

"tags":{"strKey":"tagValue",
        "longKey":987654321,
        "doubleKey":9876.54321,
        "boolKey":true,
        "bytesKey":"AAEC/f7/"}}

by code in the Pipes commons module. The commons module also contains other shared code that:

reads Kafka configurations,
facilitates creating and starting Kafka Streams,
serializes Spans,
provides shared constants to unit tests,
changes environment variables to lower case for consumption by cfg4j (haystack-pipes uses cfg4j to read configuration files),
Starts polling for the Counters and Timers provided by haystack-metrics.

Building

Cloning

From scratch

Since this repo contains haystack-idl as a submodule, a recursive clone of the haystack-pipes package is required:

git clone --recursive [email protected]:ExpediaDotCom/haystack-pipes.git .

From existing directory

If you have already cloned the the haystack-pipes package (perhaps with an IDE that did not clone recursively as the command above instructs), or if you want to pick up a newer version of the haystack-idl package, run the following from your haystack-pipes directory:

git submodule update --init --recursive

Prerequisites:

Java 1.8
Maven 3.3.9 or higher
Docker 1.13 or higher

Build

Full build

For a full build, including unit tests, run (from the directory to where you cloned haystack-pipes):

make all

Expedia.com

Versions

Version
2.0.0 Oct 5, 2020

com.expedia.www:sample-key-extractor

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL