Elasticsearch Twitter River plugin

The Twitter river indexes the public twitter stream, aka the hose, and makes it searchable

License	License The Apache Software License, Version 2.0
Categories	Categories Search Business Logic Libraries Elasticsearch
GroupId	GroupId org.elasticsearch
ArtifactId	ArtifactId elasticsearch-river-twitter
Last Version	Last Version 2.6.0
Release Date	Release Date Jun 11, 2015
Type	Type jar
Description	Description Elasticsearch Twitter River plugin The Twitter river indexes the public twitter stream, aka the hose, and makes it searchable
Project URL	Project URL https://github.com/elastic/elasticsearch-river-twitter/
Source Code Management	Source Code Management http://github.com/elastic/elasticsearch-river-twitter

Download elasticsearch-river-twitter

Filename	Size
elasticsearch-river-twitter-2.6.0.pom
elasticsearch-river-twitter-2.6.0.zip	336 KB
elasticsearch-river-twitter-2.6.0-sources.jar	7 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/org.elasticsearch/elasticsearch-river-twitter/ -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-river-twitter</artifactId>
    <version>2.6.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/org.elasticsearch/elasticsearch-river-twitter/
implementation 'org.elasticsearch:elasticsearch-river-twitter:2.6.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/org.elasticsearch/elasticsearch-river-twitter/
implementation ("org.elasticsearch:elasticsearch-river-twitter:2.6.0")

Apache Buildr

'org.elasticsearch:elasticsearch-river-twitter:jar:2.6.0'

Apache Ivy

<dependency org="org.elasticsearch" name="elasticsearch-river-twitter" rev="2.6.0">
  <artifact name="elasticsearch-river-twitter" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='org.elasticsearch', module='elasticsearch-river-twitter', version='2.6.0')
)

Scala SBT

libraryDependencies += "org.elasticsearch" % "elasticsearch-river-twitter" % "2.6.0"

Leiningen

[org.elasticsearch/elasticsearch-river-twitter "2.6.0"]

Dependencies

compile (3)

Group / Artifact	Type	Version
org.elasticsearch : elasticsearch	jar	1.6.0
org.twitter4j : twitter4j-stream	jar	4.0.3
log4j : log4j	jar	1.2.17

test (4)

Group / Artifact	Type	Version
org.hamcrest : hamcrest-all	jar	1.3
com.carrotsearch.randomizedtesting : randomizedtesting-runner	jar	2.1.10
org.apache.lucene : lucene-test-framework	jar	4.10.4
org.elasticsearch : elasticsearch	test-jar	1.6.0

Project Modules

There are no modules declared in this project.

Important: This project has been stopped since elasticsearch 2.0.

Twitter River Plugin for Elasticsearch

The Twitter river indexes the public twitter stream, aka the hose, and makes it searchable.

Rivers are deprecated and will be removed in the future. Have a look at logstash twitter input.

In order to install the plugin, run:

bin/plugin install elasticsearch/elasticsearch-river-twitter/2.6.0

After installing the plugin you need to restart elasticsearch.

You need to install a version matching your Elasticsearch version:

Elasticsearch	Twitter River Plugin	Docs
master	Build from source	See below
es-1.x	Build from source	2.7.0-SNAPSHOT
es-1.6	2.6.0	2.6.0
es-1.5	2.5.0	2.5.0
es-1.4	2.4.2	2.4.2
es-1.3	2.3.0	2.3.0
es-1.2	2.2.0	2.2.0
es-1.0	2.0.0	2.0.0
es-0.90	1.5.0	1.5.0

To build a SNAPSHOT version, you need to build it with Maven:

mvn clean install
plugin --install river-twitter \ 
       --url file:target/releases/elasticsearch-river-twitter-X.X.X-SNAPSHOT.zip

Prerequisites

You need to get an OAuth token in order to use Twitter river. Please follow Twitter documentation, basically:

Login to: https://dev.twitter.com/apps/
Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new You don't need a callback URL.
When done, click on Create my access token.
Open OAuth tool tab and note Consumer key, Consumer secret, Access token and Access token secret.

Create river

Creating the twitter river can be done using:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "oauth" : {
            "consumer_key" : "*** YOUR Consumer key HERE ***",
            "consumer_secret" : "*** YOUR Consumer secret HERE ***",
            "access_token" : "*** YOUR Access token HERE ***",
            "access_token_secret" : "*** YOUR Access token secret HERE ***"
        }
    },
    "index" : {
        "index" : "my_twitter_river",
        "type" : "status",
        "bulk_size" : 100,
        "flush_interval" : "5s",
        "retry_after" : "10s"
    }
}

The above lists all the options controlling the creation of a twitter river.

If you don't define index.index, it will use your river name (my_twitter_river) as the default index name. If you don't define index.type, default status type will be used.

Note that you can define any or all of your oauth settings in elasticsearch.yml file on each node by prefixing setting with river.twitter.:

river.twitter.oauth.consumer_key: "*** YOUR Consumer key HERE ***"
river.twitter.oauth.consumer_secret: "*** YOUR Consumer secret HERE ***"
river.twitter.oauth.access_token: "*** YOUR Access token HERE ***"
river.twitter.oauth.access_token_secret: "*** YOUR Access token secret HERE ***"

In that case, you can create the river using:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter"
}

You can also overload any of elasticsearch.yml setting. A good practice could be to have consumer_key and consumer_secret in elasticsearch.yml and provide to the river access_token and access_token_secret properties.

By default, the twitter river will read a small random of all public statuses using sample API.

But, you can define statuses type you want to read:

sample: the default one
filter: track for text, users and locations. See Filtered Stream
user: listen to tweets in the authenticated user's timeline. See User Stream
firehose: all public statuses (restricted access)

For example:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "type" : "firehose"
    }
}

Note that if you define a filter (see next section), type will be automatically set to filter.

Tweets will be indexed once a bulk_size of them have been accumulated (default to 100) or every flush_interval period (default to 5s).

Filtered Stream

Filtered stream can also be supported (as per the twitter stream API). Filter stream can be configured to support tracks, follow, locations and language. user_lists is a shortcut to follow all members of a public twitter list identified by the user id and the list slug (last part of uri when open a list in your browser). The configuration is the same as the twitter API (a single comma separated string value, or using json arrays). Here is an example:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "filter" : {
            "tracks" : "test,something,please",
            "follow" : "111,222,333",
            "user_lists" : "ownerScreenName1/slug1,ownerScreenName2/slug2",
            "locations" : "-122.75,36.8,-121.75,37.8,-74,40,-73,41",
            "language" : "fr,en"
        }
    }
}

Note that locations use geoJSON order (longitude, latitude).

Note that if you want to use language filtering you need also to define at least one of tracks, follow or locations filter. Supported languages identifiers are BCP 47. You can filter whatever language defined in Twitter Advanced Search.

Here is an array based configuration example:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "filter" : {
            "tracks" : ["test", "something"],
            "follow" : [111, 222, 333],
            "locations" : [ [-122.75,36.8], [-121.75,37.8], [-74,40], [-73,41]],
            "language" : [ "fr", "en" ]
        }
    }
}

User Stream

User stream can also be supported (as per the twitter stream API). This stream return tweets on the authenticated user's timeline. Here is a basic configuration example:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "type" : "user"
    }
}

Indexing RAW Twitter stream

By default, elasticsearch twitter river will convert tweets to an equivalent representation in elasticsearch. If you want to index RAW twitter JSON content without any transformation, you can set raw to true:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "raw" : true
    }
}

Note that you should think of creating a mapping first for your tweets. See Twitter documentation on raw Tweet format:

PUT my_twitter_river/status/_mapping
{
    "status" : {
        "properties" : {
            "text" : {"type" : "string", "analyzer" : "standard"}
        }
    }
}

Ignoring Retweets

If you don't want to index retweets (aka RT), just set ignore_retweet to true (default to false):

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "ignore_retweet" : true
    }
}

Increase the schedule time to reconnect the river

It can happen that the river fails, thus closing the current connection to the Streaming API. Then, a new connection is scheduled by the river after 10s by default. If you want to manage this time, simply use the retry_after option, as in:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "index" : {
        "retry_after" : "30s"
    }
}

Geo location points as array

By default, elasticsearch twitter river index location field using the lat lon as properties format. You can set geo_as_array to true if you prefer having location indexed as an array [lon, lat].

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "geo_as_array" : true
    }
}

Remove the river

If you need to stop the Twitter river, you have to remove it:

DELETE _river/my_twitter_river/

Using a proxy

You can define a proxy if you are using one:

PUT _river/my_twitter_river/_meta
{
    "type" : "twitter",
    "twitter" : {
        "proxy" : {
            "host": "host",
            "port": "port",
            "user": "proxy_user_if_any",
            "password": "proxy_password_if_any"
        }
    }
}

You can also define proxy settings in elasticsearch.ymlfile on each node by prefixing setting with river.twitter.:

river.twitter.proxy.host: "host"
river.twitter.proxy.port: "port"
river.twitter.proxy.user: "proxy_user_if_any"
river.twitter.proxy.password: "proxy_password_if_any"

Sample document

Here is how a document could look like when using this river (without raw option):

{
   "text":"This is a text",
   "created_at":"2015-01-26T15:22:35.000Z",
   "source":"<a href=\"http://www.twitter.com\" rel=\"nofollow\">Twitter for Windows Phone</a>",
   "truncated":false,
   "language":"en",
   "mention":[

   ],
   "retweet_count":0,
   "hashtag":[

   ],
   "location":[
      78.418407,
      17.431913
   ],
   "place":{
      "id":"243cc16f6417a167",
      "name":"Hyderabad",
      "type":"city",
      "full_name":"Hyderabad, Andhra Pradesh",
      "street_address":null,
      "country":"India",
      "country_code":"IN",
      "url":"https://api.twitter.com/1.1/geo/id/243cc16f6417a167.json"
   },
   "link":[

   ],
   "user":{
      "id":1111111111,
      "name":"User Name",
      "screen_name":"twitter_handle",
      "location":"A full text location description",
      "description":"A description",
      "profile_image_url":"http://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg",
      "profile_image_url_https":"https://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg"
   }
}

Tests

Integrations tests in this plugin require working Twitter account and therefore disabled by default. You need to create your credentials as explained in Prerequisites.

To enable tests prepare a config file elasticsearch.yml with the following content:

river:
  twitter:
      oauth:
         consumer_key: "your_consumer_key"
         consumer_secret: "your_consumer_secret"
         access_token: "your_access_token"
         access_token_secret: "your_access_token_secret"

Replace all occurrences of your_consumer_key, your_consumer_secret, your_access_token and your_access_token_secret with your settings.

To run test:

mvn -Dtests.twitter=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test

Note that if you want to test User Stream, you need to define write rights for your twitter application.

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

elastic

Versions

Version
2.6.0 Jun 11, 2015
2.5.0 Mar 31, 2015
2.4.2 Feb 12, 2015
2.4.1 Nov 5, 2014
2.4.0 Oct 9, 2014
2.3.0 Aug 7, 2014
2.2.0 Jul 23, 2014
2.0.0 Mar 8, 2014
2.0.0.RC1 Jan 15, 2014
1.5.0 Mar 8, 2014
1.4.0 Jun 12, 2013
1.3.0 Jun 3, 2013
1.2.0 Dec 6, 2012
1.1.0 Feb 7, 2012
1.0.0 Dec 5, 2011
0.18.7 Jan 10, 2012
0.18.6 Dec 19, 2011
0.18.5 Nov 29, 2011
0.18.4 Nov 16, 2011
0.18.3 Nov 16, 2011
0.18.2 Oct 27, 2011
0.18.1 Oct 27, 2011
0.18.0 Oct 26, 2011
0.17.10 Nov 16, 2011
0.17.9 Oct 20, 2011
0.17.8 Oct 7, 2011
0.17.7 Sep 19, 2011
0.17.6 Aug 13, 2011
0.17.5 Aug 12, 2011
0.17.4 Aug 5, 2011
0.17.3 Aug 4, 2011
0.17.2 Jul 27, 2011
0.17.1 Jul 21, 2011
0.17.0 Jul 19, 2011
0.16.5 Jul 26, 2011
0.16.4 Jul 14, 2011
0.16.3 Jul 8, 2011
0.16.2 Jun 1, 2011
0.16.1 May 12, 2011
0.16.0 Apr 24, 2011
0.15.2 Mar 7, 2011
0.15.1 Mar 1, 2011
0.15.0 Feb 18, 2011
0.14.4 Feb 1, 2011
0.14.3 Jan 24, 2011
0.14.2 Jan 6, 2011
0.14.1 Dec 29, 2010
0.14.0 Dec 27, 2010
0.13.1 Dec 3, 2010
0.13.0 Nov 18, 2010
0.12.1 Oct 27, 2010
0.12.0 Oct 19, 2010
0.11.0 Sep 29, 2010

Elasticsearch Twitter River plugin

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download elasticsearch-river-twitter

How to add to project

Dependencies

compile (3)

test (4)

Project Modules

Twitter River Plugin for Elasticsearch

Prerequisites

Create river

Filtered Stream

User Stream

Indexing RAW Twitter stream

Ignoring Retweets

Increase the schedule time to reconnect the river

Geo location points as array

Remove the river

Using a proxy

Sample document

Tests

License

elastic

Versions