secor

Kafka to s3/gs/swift logs exporter

License

License

GroupId

GroupId

com.pinterest
ArtifactId

ArtifactId

secor
Last Version

Last Version

0.29
Release Date

Release Date

Type

Type

jar
Description

Description

secor
Kafka to s3/gs/swift logs exporter
Project URL

Project URL

https://github.com/pinterest/secor
Source Code Management

Source Code Management

https://github.com/pinterest/secor

Download secor

How to add to project

<!-- https://jarcasting.com/artifacts/com.pinterest/secor/ -->
<dependency>
    <groupId>com.pinterest</groupId>
    <artifactId>secor</artifactId>
    <version>0.29</version>
</dependency>
// https://jarcasting.com/artifacts/com.pinterest/secor/
implementation 'com.pinterest:secor:0.29'
// https://jarcasting.com/artifacts/com.pinterest/secor/
implementation ("com.pinterest:secor:0.29")
'com.pinterest:secor:jar:0.29'
<dependency org="com.pinterest" name="secor" rev="0.29">
  <artifact name="secor" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.pinterest', module='secor', version='0.29')
)
libraryDependencies += "com.pinterest" % "secor" % "0.29"
[com.pinterest/secor "0.29"]

Dependencies

compile (40)

Group / Artifact Type Version
io.confluent » kafka-avro-serializer jar 2.0.1
org.apache.avro : avro jar 1.10.0
org.apache.parquet : parquet-avro jar 1.11.0
com.google.protobuf : protobuf-java jar 3.12.2
com.google.protobuf : protobuf-java-util jar 3.12.2
com.amazonaws : aws-java-sdk-s3 jar 1.11.821
com.amazonaws : aws-java-sdk-sts jar 1.11.821
net.java.dev.jets3t : jets3t jar 0.9.4
log4j : log4j jar 1.2.17
org.slf4j : slf4j-api jar 1.7.30
org.slf4j : jcl-over-slf4j jar 1.7.30
org.slf4j : slf4j-log4j12 jar 1.7.30
commons-configuration : commons-configuration jar 1.10
org.apache.hadoop : hadoop-common jar 2.9.2
org.apache.hadoop : hadoop-hdfs-client jar 2.9.2
org.apache.hadoop : hadoop-mapreduce-client-core jar 2.9.2
org.apache.hadoop : hadoop-aws jar 2.9.2
org.apache.hadoop : hadoop-openstack jar 2.9.2
org.apache.parquet : parquet-common jar 1.11.0
org.apache.parquet : parquet-encoding jar 1.11.0
org.apache.parquet : parquet-column jar 1.11.0
org.apache.parquet : parquet-hadoop jar 1.11.0
org.apache.parquet : parquet-protobuf jar 1.11.0
org.apache.parquet : parquet-thrift jar 1.11.0
org.apache.thrift : libthrift jar 0.12.0
org.apache.curator : curator-client jar 2.13.0
org.apache.curator : curator-framework jar 2.13.0
com.google.guava : guava jar 24.1.1-jre
net.minidev : json-smart jar 2.3
org.mockito : mockito-core jar 3.3.3
org.powermock : powermock-api-mockito2 jar 2.0.7
org.powermock : powermock-module-junit4 jar 2.0.7
org.msgpack : jackson-dataformat-msgpack jar 0.8.20
com.datadoghq : java-dogstatsd-client jar 2.10.2
com.google.cloud : google-cloud-storage jar 1.111.2
com.microsoft.azure : azure-storage jar 8.6.5
org.apache.orc : orc-core jar 1.6.3
io.micrometer : micrometer-registry-jmx jar 1.5.2
io.micrometer : micrometer-registry-statsd jar 1.5.2
io.micrometer : micrometer-registry-prometheus jar 1.5.2

test (1)

Group / Artifact Type Version
junit : junit jar 4.11

Project Modules

There are no modules declared in this project.

Pinterest Secor

Build Status

Secor is a service persisting Kafka logs to Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage and Openstack Swift.

Key features

  • strong consistency: as long as Kafka is not dropping messages (e.g., due to aggressive cleanup policy) before Secor is able to read them, it is guaranteed that each message will be saved in exactly one S3 file. This property is not compromised by the notorious temporal inconsistency of S3 caused by the eventual consistency model,
  • fault tolerance: any component of Secor is allowed to crash at any given point without compromising data integrity,
  • load distribution: Secor may be distributed across multiple machines,
  • horizontal scalability: scaling the system out to handle more load is as easy as starting extra Secor processes. Reducing the resource footprint can be achieved by killing any of the running Secor processes. Neither ramping up nor down has any impact on data consistency,
  • output partitioning: Secor parses incoming messages and puts them under partitioned s3 paths to enable direct import into systems like Hive. day,hour,minute level partitions are supported by secor
  • configurable upload policies: commit points controlling when data is persisted in S3 are configured through size-based and time-based policies (e.g., upload data when local buffer reaches size of 100MB and at least once per hour),
  • monitoring: metrics tracking various performance properties are exposed through Ostrich, Micrometer and optionally exported to OpenTSDB / statsD,
  • customizability: external log message parser may be loaded by updating the configuration,
  • event transformation: external message level transformation can be done by using customized class.
  • Qubole interface: Secor connects to Qubole to add finalized output partitions to Hive tables.

Release Notes

Release Notes for past versions can be found in RELEASE.md.

Setup/Configuration Guide

Setup/Configuration instruction is available in README.setup.md.

Secor configuration for Kubernetes/GKE environment

Extra Setup instruction for Kubernetes/GKE environment is available in README.kubernetes.md.

Detailed design

Design details are available in DESIGN.md.

License

Secor is distributed under Apache License, Version 2.0.

Maintainers

Contributors

Companies who use Secor

Help

If you have any questions or comments, you can reach us at [email protected]

com.pinterest

Pinterest

Pinterest's Open Source Projects

Versions

Version
0.29
0.28
0.27
0.26
0.25
0.24
0.23
0.22
0.21
0.20
0.19
0.18
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.10
0.9
0.8
0.7
0.6
0.5
0.4
0.2
0.1