Elasticsearch schema migration library

A flyway inspired Elasticsearch schema migration library

License	License Apache License Version 2.0
Categories	Categories Search Business Logic Libraries Elasticsearch
GroupId	GroupId com.github.eemmiirr.lib
ArtifactId	ArtifactId elasticsearch-migration
Last Version	Last Version 1.2.0
Release Date	Release Date Nov 26, 2019
Type	Type jar
Description	Description Elasticsearch schema migration library A flyway inspired Elasticsearch schema migration library
Project URL	Project URL https://github.com/eemmiirr/elasticsearch-migration
Source Code Management	Source Code Management http://github.com/eemmiirr/elasticsearch-migration.git

Download elasticsearch-migration

Filename	Size
elasticsearch-migration-1.2.0.pom
elasticsearch-migration-1.2.0.jar	83 KB
elasticsearch-migration-1.2.0-sources.jar	55 KB
elasticsearch-migration-1.2.0-javadoc.jar	285 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.github.eemmiirr.lib/elasticsearch-migration/ -->
<dependency>
    <groupId>com.github.eemmiirr.lib</groupId>
    <artifactId>elasticsearch-migration</artifactId>
    <version>1.2.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.github.eemmiirr.lib/elasticsearch-migration/
implementation 'com.github.eemmiirr.lib:elasticsearch-migration:1.2.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.github.eemmiirr.lib/elasticsearch-migration/
implementation ("com.github.eemmiirr.lib:elasticsearch-migration:1.2.0")

Apache Buildr

'com.github.eemmiirr.lib:elasticsearch-migration:jar:1.2.0'

Apache Ivy

<dependency org="com.github.eemmiirr.lib" name="elasticsearch-migration" rev="1.2.0">
  <artifact name="elasticsearch-migration" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.github.eemmiirr.lib', module='elasticsearch-migration', version='1.2.0')
)

Scala SBT

libraryDependencies += "com.github.eemmiirr.lib" % "elasticsearch-migration" % "1.2.0"

Leiningen

[com.github.eemmiirr.lib/elasticsearch-migration "1.2.0"]

Dependencies

compile (12)

Group / Artifact	Type	Version
com.google.guava : guava	jar	21.0
org.apache.commons : commons-lang3	jar	3.7
org.apache.directory.studio : org.apache.commons.io	jar	2.4
org.reflections : reflections	jar	0.9.11
com.fasterxml.jackson.core : jackson-databind	jar	2.10.0
com.fasterxml.jackson.datatype : jackson-datatype-jdk8	jar	2.10.0
com.fasterxml.jackson.datatype : jackson-datatype-jsr310	jar	2.10.0
com.jayway.jsonpath : json-path	jar	2.3.0
com.fasterxml.jackson.dataformat : jackson-dataformat-yaml	jar	2.10.0
com.github.java-json-tools : json-schema-validator	jar	2.2.10
org.elasticsearch.client : elasticsearch-rest-high-level-client	jar	7.4.0
org.slf4j : slf4j-api	jar	1.7.25

provided (1)

Group / Artifact	Type	Version
org.projectlombok : lombok	jar	1.18.0

test (7)

Group / Artifact	Type	Version
junit : junit	jar	4.12
org.hamcrest : hamcrest-all	jar	1.3
com.jayway.restassured : rest-assured	jar	2.9.0
org.apache.logging.log4j : log4j-api	jar	2.8.2
org.apache.logging.log4j : log4j-core	jar	2.8.2
org.apache.logging.log4j : log4j-1.2-api	jar	2.8.2
org.apache.logging.log4j : log4j-slf4j-impl	jar	2.8.2

Project Modules

There are no modules declared in this project.

Elasticsearch Migration

A simple and lightweight migration tool for Elasticsearch database that's based on Axel Fontaine's Flyway project. Elasticsearch Migration works just like Flyway but using yaml files for describing changessets.

Requirements

Java (Tested with JDK 8+)

Elasticsearch version	Tested with	Library version	groupId
6.x.x	6.2.4	1.0.0 - 1.0.5	com.hubrick.lib
7.x.x	7.4.0	1.1.0 - 1.2.0	com.github.eemmiirr.lib

Latest version

<dependency>
    <groupId>com.github.eemmiirr.lib</groupId>
    <artifactId>elasticsearch-migration</artifactId>
    <version>1.2.0</version>
</dependency>

Indexes

These indexes are created on the first run and are there too keep track of the migrations.

Migration version index (elasticsearch_migration_version)

Keeping track of the executed changesets. If a migration fails it will be transitioned to state 'FAILED' and the failureMessage field will contain the reason. The entry won't be removed and the changes applied to this point will stay in the cluster. There is no automatic rollback which means that the cleaneup has to be done manually.

  {
    "settings": {
        "number_of_shards": 3
    },
    "mappings": {
        "dynamic": "strict",
        "_source": {
            "enabled": true
        },
        "properties": {
            "identifier": {
                "type": "keyword",
                "index": true
            },
            "version": {
                "type": "keyword",
                "index": true
            },
            "name": {
                "type": "keyword",
                "index": true
            },
            "sha256Checksum": {
                "type": "keyword",
                "index": true
            },
            "state": {
                "type": "keyword",
                "index": true
            },
            "failureMessage": {
                "type": "text",
                "index": true
            },
            "created": {
                "type": "date",
                "format": "date_time",
                "index": true
            }
        }
    }
}

Migration lock index (elasticsearch_migration_lock)

Used to create a pessimistic lock during the migration so only one client makes changes at a time. In case the migration is aborted for any reason the lock won't be removed and has to be removed manually.

{
    "settings": {
        "number_of_shards": 3
    },
    "mappings": {
        "dynamic": "strict",
        "_source": {
            "enabled": true
        },
        "properties": {
            "created": {
                "type": "date",
                "format": "date_time",
                "index": true
            }
        }
    }
}

YAML changesets

The changesets are defined with versioned yaml files (V{version}__{name}.yaml)(example: V1_0_0__singularity.yaml). The yaml files have to conform to this schema YAML Schema.

Currently the following migration types are supported:

CREATE_INDEX
DELETE_INDEX
CREATE_OR_UPDATE_INDEX_TEMPLATE
DELETE_INDEX_TEMPLATE
UPDATE_MAPPING
INDEX_DOCUMENT
UPDATE_DOCUMENT
DELETE_DOCUMENT
CREATE_INGEST_PIPELINE
ALIASES
REINDEX
DELETE_INGEST_PIPELINE

Example changeset

migrations:
  - type: CREATE_INDEX
    index: 'test_index'
    definition: >
      {
          "settings": {
              "number_of_shards": 3
          },
          "mappings": {
              "dynamic": false,
              "_source": {
                  "enabled": true
              },
              "properties": {
                  "user": {
                      "type": "keyword",
                      "index": true
                  },
                  "post_date": {
                      "type": "keyword",
                      "index": true
                  },
                  "message": {
                      "type": "keyword",
                      "index": true
                  }
              }
          }
      }
  - type: CREATE_OR_UPDATE_INDEX_TEMPLATE
    template: 'test_template'
    definition: >
      {
        "index_patterns": ["foo*", "bar*"],
        "settings": {
          "number_of_shards": 1
        },
        "mappings": {
          "properties": {
            "host_name": {
              "type": "keyword"
            },
            "created_at": {
              "type": "date",
              "format": "EEE MMM dd HH:mm:ss Z YYYY"
            }
          }
        }
      }
  - type: CREATE_INGEST_PIPELINE
    id: 'test_pipeline'
    definition: >
        {
          "description" : "rename xxx",
          "processors" : [
            {
              "rename": {
                "field": "xxx",
                "target_field": "yyy"
              }
            }
          ]
        }
  - type: ALIASES
    definition: >
        {
          "actions": [
            {
              "add": {
                "index": "test_index_1",
                "alias": "test_index_alias"
              }
            },
            {
              "add": {
                "index": "test_index_2",
                "alias": "test_index_alias"
              }
            }
          ]
        }
  - type: REINDEX
    definition: >
      {
        "source": {
          "index": "test_index_1"
        },
        "dest": {
          "index": "test_index_2"
        }
      }
  - type: UPDATE_MAPPING
    indices:
      - 'test_index'
    definition: >
      {
        "properties": {
          "email": {
            "type": "keyword"
          }
        }
      }
  - type: INDEX_DOCUMENT
    index: 'test_index'
    id: '1'
    definition: >
      {
          "user" : "kimchy",
          "post_date" : "2009-11-15T14:12:12",
          "message" : "trying out Elasticsearch"
      }
  - type: UPDATE_DOCUMENT
    index: 'test_index'
    id: '1'
    definition: >
      {
          "doc" : {
              "user" : "new_user"
          }
      }
  - type: DELETE_DOCUMENT
    index: 'test_index'
    id: '1'
  - type: DELETE_INDEX_TEMPLATE
    template: 'test_template'
  - type: DELETE_INDEX
    index: 'test_index'
  - type: DELETE_INGEST_PIPELINE
    id: 'test_pipeline'

Usage

Each service has to define an identitifier which will identify the owner of the indexes, templates, documents etc. and locks in the ES cluster. The easiest way is to give the identifier the service name which ownes it.

Example:

final ElasticsearchMigration elasticsearchMigration = new ElasticsearchMigration(
  ElasticsearchMigrationConfig.builder( 
    "test-service", 
    ElasticsearchConfig.builder(new URL("http://localhost:9200")).build()
  ).basePackage("migration.es").build()
);

elasticsearchMigration.migrate();

Migration from previous un-managed schema

Collect all your schema in one yaml changeset.
Create 'Migration version index' and 'Migration lock index' using the schemas from above or from the source tree
Startup your application manually. After it's started there will be one entry in the 'elasticsearch_migration_version' index. Copy this entry over to your staging/production ES cluster.

Improvements

In case a migration is aborted in the middle the lock stays there forever. Unlock it after a TTL.
Figure out the number of shards and make use of wait_for_active_shards for maxiumum consistency
Add more functionality

Limitations

The tool does not roll back the database upon migration failure. You're expected to manually restore backup.

License

Apache License, Version 2.0

Versions

Version
1.2.0 Nov 26, 2019
1.1.0 Oct 28, 2019

Elasticsearch schema migration library

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download elasticsearch-migration

How to add to project

Dependencies

compile (12)

provided (1)

test (7)

Project Modules

Elasticsearch Migration

Requirements

Indexes

Migration version index (elasticsearch_migration_version)

Migration lock index (elasticsearch_migration_lock)

YAML changesets

Example changeset

Usage

Migration from previous un-managed schema

Improvements

Limitations

License

Versions