vertx-datacollector

A framework to collect and post-process data from any source

License

License

Categories

Categories

Data
GroupId

GroupId

info.pascalkrause
ArtifactId

ArtifactId

vertx-datacollector
Last Version

Last Version

0.0.6
Release Date

Release Date

Type

Type

jar
Description

Description

vertx-datacollector
A framework to collect and post-process data from any source
Project URL

Project URL

https://github.com/caspal/vertx-datacollector
Source Code Management

Source Code Management

https://github.com/caspal/vertx-datacollector

Download vertx-datacollector

How to add to project

<!-- https://jarcasting.com/artifacts/info.pascalkrause/vertx-datacollector/ -->
<dependency>
    <groupId>info.pascalkrause</groupId>
    <artifactId>vertx-datacollector</artifactId>
    <version>0.0.6</version>
</dependency>
// https://jarcasting.com/artifacts/info.pascalkrause/vertx-datacollector/
implementation 'info.pascalkrause:vertx-datacollector:0.0.6'
// https://jarcasting.com/artifacts/info.pascalkrause/vertx-datacollector/
implementation ("info.pascalkrause:vertx-datacollector:0.0.6")
'info.pascalkrause:vertx-datacollector:jar:0.0.6'
<dependency org="info.pascalkrause" name="vertx-datacollector" rev="0.0.6">
  <artifact name="vertx-datacollector" type="jar" />
</dependency>
@Grapes(
@Grab(group='info.pascalkrause', module='vertx-datacollector', version='0.0.6')
)
libraryDependencies += "info.pascalkrause" % "vertx-datacollector" % "0.0.6"
[info.pascalkrause/vertx-datacollector "0.0.6"]

Dependencies

compile (4)

Group / Artifact Type Version
io.vertx : vertx-core jar [3.5.1,)
io.vertx : vertx-service-proxy jar [3.5.1,)
io.vertx : vertx-codegen jar [3.5.1,)
io.dropwizard.metrics : metrics-core jar 4.0.2

test (3)

Group / Artifact Type Version
io.vertx : vertx-unit jar [3.5.1,)
junit : junit jar 4.12
com.google.truth : truth jar 0.39

Project Modules

There are no modules declared in this project.

vertx-datacollector

A framework to collect and post-process data from any source.

Import

Maven

<dependency>
  <groupId>info.pascalkrause</groupId>
  <artifactId>vertx-datacollector</artifactId>
  <version>0.0.6</version>
  <scope>compile</scope>
</dependency>

Gradle

compile 'info.pascalkrause:vertx-datacollector:0.0.6'

Get Started

CollectorJob

The first step is to implement the actual collector job (e.g. crawl a dataset from a website). The collector job should be implemented in the Future which is returned by the collect() method. The Future will be executed in a seperate worker thread, which allows to have blocking operations here.

public Handler<Future<CollectorJobResult>> collect(String requestId, JsonObject feature);

After the collection step is done, it is possible to do some post-processing stuff (e.g. write result into database) in the Future which is returned by the postCollectAction method, which also can handle blocking operations.

public Handler<Future<CollectorJobResult>> postCollectAction(AsyncResult<CollectorJobResult> result);

DataCollectorServiceVerticle

After implementing the CollectorJob, the verticle can be deployed.

  • ebAddress: The eventbus address
  • job: The job which will be processed in the CollectorJobExecutor
  • workerPoolSize: The pool size of the CollectorJobExecutor
  • queueSize: The queue size of CollectorJob requests
  • enableMetrics: Enables metrics for the DataCollectorService
DataCollectorServiceVerticle verticle = new DataCollectorServiceVerticle(
  ebAddress, job, workerPoolSize, queueSize, enableMetrics);

vertx.deployVerticle(verticle);

DataCollectorService

When the verticle was successfully deployed, the DataCollectorService can connect to the verticle. A list of methods which are offered by the DataCollectorService can be found here.

String ebAddress = "addressOfCollectorVerticle";
DataCollectorServiceFactory factory = new DataCollectorServiceFactory(vertx, ebAddress);

DataCollectorService dcs = factory.create();
// or with DeliveryOptions
DeliveryOptions delOpts = new DeliveryOptions .....
DataCollectorService dcs = factory.create(delOpts);

DataCollectorServiceClient

The DataCollectorService is a Vert.x proxy which must stick to some restrictions to be able to translate this service also into other languages. The idea of the DataCollectorServiceClient is, having a Java client that can be used as a facade for the DataCollectorService to offer higher-value functions and do some Java specific converting e.g. error trasnformation. A list of methods which are offered by the DataCollectorServiceClient can be found here.

DataCollectorService dcs = .....
DataCollectorServiceClient dcsc = new DataCollectorServiceClient(dcs);

Architecture

alt text

JavaDoc

The latest JavaDoc can be found here.

Run tests

./gradlew test

Contribute

We are using Gerrit, so PRs in Github will probably be overlooked. Please use GerritHub.io to contribute changes. The project name is caspal/vertx-datacollector

Code Style

  1. Encoding must be in UTF-8.
  2. Change must have a commit message.
  3. The line endings must be LF (linux).
  4. The maximum length of a line should be between 80 and 120 characters.
  5. Use spaces instead of tabs.
  6. Use 4 spaces for indentation
  7. No trailing whitespaces.
  8. Avoid unnecessary empty lines.
  9. Adapt your code to the surroundings.
  10. Follow the default language style guide.

An Eclipse formatter can be found in the resources folder.

Versions

Version
0.0.6
0.0.5
0.0.4
0.0.3
0.0.2
0.0.1