elasticsearch-river-github

Github River for ElasticSearch

License

License

Categories

Categories

Github Development Tools Version Controls Search Business Logic Libraries Elasticsearch
GroupId

GroupId

com.ubervu
ArtifactId

ArtifactId

elasticsearch-river-github
Last Version

Last Version

1.7.1
Release Date

Release Date

Type

Type

jar
Description

Description

elasticsearch-river-github
Github River for ElasticSearch
Source Code Management

Source Code Management

http://github.com/ubervu/elasticsearch-river-github

Download elasticsearch-river-github

How to add to project

<!-- https://jarcasting.com/artifacts/com.ubervu/elasticsearch-river-github/ -->
<dependency>
    <groupId>com.ubervu</groupId>
    <artifactId>elasticsearch-river-github</artifactId>
    <version>1.7.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.ubervu/elasticsearch-river-github/
implementation 'com.ubervu:elasticsearch-river-github:1.7.1'
// https://jarcasting.com/artifacts/com.ubervu/elasticsearch-river-github/
implementation ("com.ubervu:elasticsearch-river-github:1.7.1")
'com.ubervu:elasticsearch-river-github:jar:1.7.1'
<dependency org="com.ubervu" name="elasticsearch-river-github" rev="1.7.1">
  <artifact name="elasticsearch-river-github" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.ubervu', module='elasticsearch-river-github', version='1.7.1')
)
libraryDependencies += "com.ubervu" % "elasticsearch-river-github" % "1.7.1"
[com.ubervu/elasticsearch-river-github "1.7.1"]

Dependencies

compile (4)

Group / Artifact Type Version
org.elasticsearch : elasticsearch jar 1.0.1
commons-io : commons-io jar 2.4
commons-codec : commons-codec jar 1.6
com.google.code.gson : gson jar 2.2.2

runtime (1)

Group / Artifact Type Version
log4j : log4j jar 1.2.16

Project Modules

There are no modules declared in this project.

elasticsearch-river-github

Elasticsearch river for GitHub data. Fetches all of the following for a given GitHub repo:

Works for private repos as well if you provide authentication.

##Easy install

Assuming you have elasticsearch's bin folder in your PATH:

plugin -i com.ubervu/elasticsearch-river-github/1.7.1

Otherwise, you have to find the directory yourself. It should be /usr/share/elasticsearch/bin on Ubuntu.

##Adding the river

curl -XPUT localhost:9200/_river/my_gh_river/_meta -d '{
    "type": "github",
    "github": {
        "owner": "gabrielfalcao",
        "repository": "lettuce",
        "interval": 60,
        "authentication": {
            "username": "MYUSER", # or token
            "password": "MYPASSWORD" # or x-oauth-basic when using a token
        }
        "endpoint": "https://api.somegithub.com" # optional, use it only for non github.com
    }
}'

interval is optional, given in seconds and changes how often the river looks for new data. Since 1.7.1 the default value has been reduced to one minute as we now only load issues and events that has changed, which should decrease API calls and improve the time to update quite significantly. The actual polling interval will be affected by GitHub's minimum allowed polling interval, which is normally 60 seconds, but may increase when servers are busy.

authentication is optional and helps with the API rate limit (5000 requests/hour instead of 60 requests/hour) and when accessing private data. You can use your own GitHub credentials or a token. When using a token, fill in the token as the username and x-oauth-basic as the password, as the docs mention.

If you do not use authentication, you may want to set interval to a higher value, like 900 (every 15 minutes), as the GitHub rate limit will probably be breached when using low values. This is not recommended if you require the GitHub events without holes, as Github only allows access to the last 300 events. In that case, authenticating is highly recommended. This will probably change in a later version, at least for repositories without too much traffic, as we should be able to check for changes before loading most types of entries.

##Deleting the river

curl -XDELETE localhost:9200/_river/my_gh_river

##Indexes and types

The data will be stored in an index of format "%s&%s" % (owner, repo), i.e. gabrielfalcao&lettuce.

For every API event type, there will be an elasticsearch type of the same name - i.e. ForkEvent.

Issue data will be stored with the IssueData type. Pull request data will be stored with the PullRequestData type. Milestone data will be stored with the MilestoneData type.

com.ubervu

uberVU

Instant Brand Insights powered by Social Media

Versions

Version
1.7.1
1.6.3
1.6.2
1.6.1
1.6.0
1.5.1
1.5.0
1.4.2
1.4.1
1.4.0
1.3.0
1.2.1
1.2.0
1.1.0
1.0.0