Elasticsearch schema migration library

A flyway inspired Elasticsearch schema migration library

License

License

Categories

Categories

Search Business Logic Libraries Elasticsearch
GroupId

GroupId

com.github.eemmiirr.lib
ArtifactId

ArtifactId

elasticsearch-migration
Last Version

Last Version

1.2.0
Release Date

Release Date

Type

Type

jar
Description

Description

Elasticsearch schema migration library
A flyway inspired Elasticsearch schema migration library
Project URL

Project URL

https://github.com/eemmiirr/elasticsearch-migration
Source Code Management

Source Code Management

http://github.com/eemmiirr/elasticsearch-migration.git

Download elasticsearch-migration

How to add to project

<!-- https://jarcasting.com/artifacts/com.github.eemmiirr.lib/elasticsearch-migration/ -->
<dependency>
    <groupId>com.github.eemmiirr.lib</groupId>
    <artifactId>elasticsearch-migration</artifactId>
    <version>1.2.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.eemmiirr.lib/elasticsearch-migration/
implementation 'com.github.eemmiirr.lib:elasticsearch-migration:1.2.0'
// https://jarcasting.com/artifacts/com.github.eemmiirr.lib/elasticsearch-migration/
implementation ("com.github.eemmiirr.lib:elasticsearch-migration:1.2.0")
'com.github.eemmiirr.lib:elasticsearch-migration:jar:1.2.0'
<dependency org="com.github.eemmiirr.lib" name="elasticsearch-migration" rev="1.2.0">
  <artifact name="elasticsearch-migration" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.eemmiirr.lib', module='elasticsearch-migration', version='1.2.0')
)
libraryDependencies += "com.github.eemmiirr.lib" % "elasticsearch-migration" % "1.2.0"
[com.github.eemmiirr.lib/elasticsearch-migration "1.2.0"]

Dependencies

compile (12)

Group / Artifact Type Version
com.google.guava : guava jar 21.0
org.apache.commons : commons-lang3 jar 3.7
org.apache.directory.studio : org.apache.commons.io jar 2.4
org.reflections : reflections jar 0.9.11
com.fasterxml.jackson.core : jackson-databind jar 2.10.0
com.fasterxml.jackson.datatype : jackson-datatype-jdk8 jar 2.10.0
com.fasterxml.jackson.datatype : jackson-datatype-jsr310 jar 2.10.0
com.jayway.jsonpath : json-path jar 2.3.0
com.fasterxml.jackson.dataformat : jackson-dataformat-yaml jar 2.10.0
com.github.java-json-tools : json-schema-validator jar 2.2.10
org.elasticsearch.client : elasticsearch-rest-high-level-client jar 7.4.0
org.slf4j : slf4j-api jar 1.7.25

provided (1)

Group / Artifact Type Version
org.projectlombok : lombok jar 1.18.0

test (7)

Group / Artifact Type Version
junit : junit jar 4.12
org.hamcrest : hamcrest-all jar 1.3
com.jayway.restassured : rest-assured jar 2.9.0
org.apache.logging.log4j : log4j-api jar 2.8.2
org.apache.logging.log4j : log4j-core jar 2.8.2
org.apache.logging.log4j : log4j-1.2-api jar 2.8.2
org.apache.logging.log4j : log4j-slf4j-impl jar 2.8.2

Project Modules

There are no modules declared in this project.

Elasticsearch Migration

A simple and lightweight migration tool for Elasticsearch database that's based on Axel Fontaine's Flyway project. Elasticsearch Migration works just like Flyway but using yaml files for describing changessets.

Requirements

  • Java (Tested with JDK 8+)
Elasticsearch version Tested with Library version groupId
6.x.x 6.2.4 1.0.0 - 1.0.5 com.hubrick.lib
7.x.x 7.4.0 1.1.0 - 1.2.0 com.github.eemmiirr.lib

Latest version

<dependency>
    <groupId>com.github.eemmiirr.lib</groupId>
    <artifactId>elasticsearch-migration</artifactId>
    <version>1.2.0</version>
</dependency>

Indexes

These indexes are created on the first run and are there too keep track of the migrations.

Migration version index (elasticsearch_migration_version)

Keeping track of the executed changesets. If a migration fails it will be transitioned to state 'FAILED' and the failureMessage field will contain the reason. The entry won't be removed and the changes applied to this point will stay in the cluster. There is no automatic rollback which means that the cleaneup has to be done manually.

  {
    "settings": {
        "number_of_shards": 3
    },
    "mappings": {
        "dynamic": "strict",
        "_source": {
            "enabled": true
        },
        "properties": {
            "identifier": {
                "type": "keyword",
                "index": true
            },
            "version": {
                "type": "keyword",
                "index": true
            },
            "name": {
                "type": "keyword",
                "index": true
            },
            "sha256Checksum": {
                "type": "keyword",
                "index": true
            },
            "state": {
                "type": "keyword",
                "index": true
            },
            "failureMessage": {
                "type": "text",
                "index": true
            },
            "created": {
                "type": "date",
                "format": "date_time",
                "index": true
            }
        }
    }
}

Migration lock index (elasticsearch_migration_lock)

Used to create a pessimistic lock during the migration so only one client makes changes at a time. In case the migration is aborted for any reason the lock won't be removed and has to be removed manually.

{
    "settings": {
        "number_of_shards": 3
    },
    "mappings": {
        "dynamic": "strict",
        "_source": {
            "enabled": true
        },
        "properties": {
            "created": {
                "type": "date",
                "format": "date_time",
                "index": true
            }
        }
    }
}

YAML changesets

The changesets are defined with versioned yaml files (V{version}__{name}.yaml)(example: V1_0_0__singularity.yaml). The yaml files have to conform to this schema YAML Schema.

Currently the following migration types are supported:

  • CREATE_INDEX
  • DELETE_INDEX
  • CREATE_OR_UPDATE_INDEX_TEMPLATE
  • DELETE_INDEX_TEMPLATE
  • UPDATE_MAPPING
  • INDEX_DOCUMENT
  • UPDATE_DOCUMENT
  • DELETE_DOCUMENT
  • CREATE_INGEST_PIPELINE
  • ALIASES
  • REINDEX
  • DELETE_INGEST_PIPELINE

Example changeset

migrations:
  - type: CREATE_INDEX
    index: 'test_index'
    definition: >
      {
          "settings": {
              "number_of_shards": 3
          },
          "mappings": {
              "dynamic": false,
              "_source": {
                  "enabled": true
              },
              "properties": {
                  "user": {
                      "type": "keyword",
                      "index": true
                  },
                  "post_date": {
                      "type": "keyword",
                      "index": true
                  },
                  "message": {
                      "type": "keyword",
                      "index": true
                  }
              }
          }
      }
  - type: CREATE_OR_UPDATE_INDEX_TEMPLATE
    template: 'test_template'
    definition: >
      {
        "index_patterns": ["foo*", "bar*"],
        "settings": {
          "number_of_shards": 1
        },
        "mappings": {
          "properties": {
            "host_name": {
              "type": "keyword"
            },
            "created_at": {
              "type": "date",
              "format": "EEE MMM dd HH:mm:ss Z YYYY"
            }
          }
        }
      }
  - type: CREATE_INGEST_PIPELINE
    id: 'test_pipeline'
    definition: >
        {
          "description" : "rename xxx",
          "processors" : [
            {
              "rename": {
                "field": "xxx",
                "target_field": "yyy"
              }
            }
          ]
        }
  - type: ALIASES
    definition: >
        {
          "actions": [
            {
              "add": {
                "index": "test_index_1",
                "alias": "test_index_alias"
              }
            },
            {
              "add": {
                "index": "test_index_2",
                "alias": "test_index_alias"
              }
            }
          ]
        }
  - type: REINDEX
    definition: >
      {
        "source": {
          "index": "test_index_1"
        },
        "dest": {
          "index": "test_index_2"
        }
      }
  - type: UPDATE_MAPPING
    indices:
      - 'test_index'
    definition: >
      {
        "properties": {
          "email": {
            "type": "keyword"
          }
        }
      }
  - type: INDEX_DOCUMENT
    index: 'test_index'
    id: '1'
    definition: >
      {
          "user" : "kimchy",
          "post_date" : "2009-11-15T14:12:12",
          "message" : "trying out Elasticsearch"
      }
  - type: UPDATE_DOCUMENT
    index: 'test_index'
    id: '1'
    definition: >
      {
          "doc" : {
              "user" : "new_user"
          }
      }
  - type: DELETE_DOCUMENT
    index: 'test_index'
    id: '1'
  - type: DELETE_INDEX_TEMPLATE
    template: 'test_template'
  - type: DELETE_INDEX
    index: 'test_index'
  - type: DELETE_INGEST_PIPELINE
    id: 'test_pipeline'

Usage

Each service has to define an identitifier which will identify the owner of the indexes, templates, documents etc. and locks in the ES cluster. The easiest way is to give the identifier the service name which ownes it.

Example:

final ElasticsearchMigration elasticsearchMigration = new ElasticsearchMigration(
  ElasticsearchMigrationConfig.builder( 
    "test-service", 
    ElasticsearchConfig.builder(new URL("http://localhost:9200")).build()
  ).basePackage("migration.es").build()
);

elasticsearchMigration.migrate();

Migration from previous un-managed schema

  1. Collect all your schema in one yaml changeset.
  2. Create 'Migration version index' and 'Migration lock index' using the schemas from above or from the source tree
  3. Startup your application manually. After it's started there will be one entry in the 'elasticsearch_migration_version' index. Copy this entry over to your staging/production ES cluster.

Improvements

  • In case a migration is aborted in the middle the lock stays there forever. Unlock it after a TTL.
  • Figure out the number of shards and make use of wait_for_active_shards for maxiumum consistency
  • Add more functionality

Limitations

  • The tool does not roll back the database upon migration failure. You're expected to manually restore backup.

License

Apache License, Version 2.0

Versions

Version
1.2.0
1.1.0