Elasticsearch Wikipedia River plugin

The Wikipedia River plugin allows index wikipedia

License

License

Categories

Categories

Search Business Logic Libraries Elasticsearch
GroupId

GroupId

org.elasticsearch
ArtifactId

ArtifactId

elasticsearch-river-wikipedia
Last Version

Last Version

2.6.0
Release Date

Release Date

Type

Type

jar
Description

Description

Elasticsearch Wikipedia River plugin
The Wikipedia River plugin allows index wikipedia
Project URL

Project URL

https://github.com/elastic/elasticsearch-river-wikipedia/
Source Code Management

Source Code Management

http://github.com/elastic/elasticsearch-river-wikipedia

Download elasticsearch-river-wikipedia

How to add to project

<!-- https://jarcasting.com/artifacts/org.elasticsearch/elasticsearch-river-wikipedia/ -->
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-river-wikipedia</artifactId>
    <version>2.6.0</version>
</dependency>
// https://jarcasting.com/artifacts/org.elasticsearch/elasticsearch-river-wikipedia/
implementation 'org.elasticsearch:elasticsearch-river-wikipedia:2.6.0'
// https://jarcasting.com/artifacts/org.elasticsearch/elasticsearch-river-wikipedia/
implementation ("org.elasticsearch:elasticsearch-river-wikipedia:2.6.0")
'org.elasticsearch:elasticsearch-river-wikipedia:jar:2.6.0'
<dependency org="org.elasticsearch" name="elasticsearch-river-wikipedia" rev="2.6.0">
  <artifact name="elasticsearch-river-wikipedia" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.elasticsearch', module='elasticsearch-river-wikipedia', version='2.6.0')
)
libraryDependencies += "org.elasticsearch" % "elasticsearch-river-wikipedia" % "2.6.0"
[org.elasticsearch/elasticsearch-river-wikipedia "2.6.0"]

Dependencies

compile (2)

Group / Artifact Type Version
org.elasticsearch : elasticsearch jar 1.6.0
log4j : log4j jar 1.2.17

test (4)

Group / Artifact Type Version
org.hamcrest : hamcrest-all jar 1.3
com.carrotsearch.randomizedtesting : randomizedtesting-runner jar 2.1.14
org.apache.lucene : lucene-test-framework jar 4.10.4
org.elasticsearch : elasticsearch test-jar 1.6.0

Project Modules

There are no modules declared in this project.

Important: This project has been stopped since elasticsearch 2.0.


Wikipedia River Plugin for Elasticsearch

The Wikipedia River plugin allows index wikipedia.

Rivers are deprecated and will be removed in the future. Have a look at stream2es.

In order to install the plugin, run:

bin/plugin install elasticsearch/elasticsearch-river-wikipedia/2.6.0

You need to install a version matching your Elasticsearch version:

Elasticsearch Wikipedia River Plugin Docs
master Build from source See below
es-1.x Build from source 2.7.0-SNAPSHOT
es-1.6 2.6.0 2.6.0
es-1.5 2.5.0 2.5.0
es-1.4 2.4.1 2.4.1
es-1.3 2.3.0 2.3.0
es-1.2 2.2.0 2.2.0
es-1.0 2.0.0 2.0.0
es-0.90 1.3.0 1.3.0

To build a SNAPSHOT version, you need to build it with Maven:

mvn clean install
plugin --install river-wikipedia \ 
       --url file:target/releases/elasticsearch-river-wikipedia-X.X.X-SNAPSHOT.zip

Create river

A simple river to index Wikipedia (English pages). Create it using:

curl -XPUT localhost:9200/_river/my_river/_meta -d '
{
    "type" : "wikipedia"
}
'

The default download is the latest wikipedia dump. It can be changed using:

{
    "type" : "wikipedia",
    "wikipedia" : {
        "url" : "url to link to wikipedia dump"
    }
}

The index name defaults to the river name, and the type defaults to page. Both can be changed in the index section:

{
    "type" : "wikipedia",
    "index" : {
        "index" : "my_index",
        "type" : "my_type"
    }
}

Since 1.3.0, by default, bulk size is 100. A bulk is flushed every 5s. Number of concurrent requests allowed to be executed is 1. You can modify those settings within index section:

{
    "type" : "wikipedia",
    "index" : {
        "index" : "my_index",
        "type" : "my_type",
        "bulk_size" : 1000,
        "flush_interval" : "1s",
        "max_concurrent_bulk" : 3
    }
}

Mapping

By default, wikipedia river will generate the following mapping:

{
   "page": {
      "properties": {
         "category": {
            "type": "string"
         },
         "disambiguation": {
            "type": "boolean"
         },
         "link": {
            "type": "string"
         },
         "redirect": {
            "type": "boolean"
         },
         "redirect_page": {
            "type": "string"
         },
         "special": {
            "type": "boolean"
         },
         "stub": {
            "type": "boolean"
         },
         "text": {
            "type": "string"
         },
         "title": {
            "type": "string"
         }
      }
   }
}

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
org.elasticsearch

elastic

Versions

Version
2.6.0
2.5.0
2.4.1
2.4.0
2.3.0
2.2.0
2.0.0
2.0.0.RC1
1.3.0
1.2.0
1.1.0
1.0.0
0.18.7
0.18.6
0.18.5
0.18.4
0.18.3
0.18.2
0.18.1
0.18.0
0.17.10
0.17.9
0.17.8
0.17.7
0.17.6
0.17.5
0.17.4
0.17.3
0.17.2
0.17.1
0.17.0
0.16.5
0.16.4
0.16.3
0.16.2
0.16.1
0.16.0
0.15.2
0.15.1
0.15.0
0.14.4
0.14.3
0.14.2
0.14.1
0.14.0
0.13.1
0.13.0
0.12.1
0.12.0