Elasticsearch comes with a built-in mechanism to automatically snapshot all indexes in your cluster.

While the traditional approach involves a simple cron job, it introduces another moving part to log, monitor, and maintain.

SLM works by creating different “repositories” which store data on different cloud stores (AWS, Google Cloud, Azure) or distributed file systems (HDFS). You can then create “policies” on top of these repos which store the snapshot metadata such as the naming convention, snapshot schedule, and retention policy. All of this can be configured using Elasticsearch’s in-built REST endpoints.

Goal

Our goal today is to achieve the following:

  1. Configure IAM credentials on the Google Compute Engine (GCE) VM which has Elasticsearch installed.
  2. Install the plug-in required to snapshot to GCS.
  3. Configure a snapshot repository pointing to a Google Cloud Storage (GCS) bucket.
  4. Configure an SLM policy to snapshot all indexes to the repo every 3 hours.
  5. Trigger the SLM manually and verify the snapshot.

Step 1: Configure IAM credentials on the VM

I’ll assume that you already have a GCE VM with Elasticsearch 7.7.1 up and running and that you have a GCS bucket for the snapshots. I’ll call my bucket gcs-es-snapshots.

To configure IAM for the snapshots, first find out your VM’s service account. GCE VMs use a service account like AWS EC2s use instance profiles. This service account will need the storage.admin role to be able to snapshot. Download the json key file for this service account. We will add this key to the Elasticsearch key store so it can authenticate with the Google Cloud.

Upload your json key file to the VM, or copy-paste using vim, to the path /tmp/key.json.

Once the file is in place, run:

/usr/share/elasticsearch/bin/elasticsearch-keystore add-file gcs.client.default.credentials_file /tmp/key.json

Step 2: Install the GCS plug-in

Now we’re going to install the GCS snapshot repository plug-in to allow Elasticsearch to interact with our GCS bucket. You can follow the steps here, or directly execute the command below:

sudo /usr/share/elasticsearch/bin/elasticsearch-plug-in install repository-gcs

The link above also has options for offline install in case you want a repeatable and automated setup.

Step 3: Configure the snapshot repository to a GCS bucket

I’m going to create a snapshot repository called my_snapshot and point it to gcs-es-snapshots. I will also put the snapshots in a folder so that my bucket root stays clean.

The command is:

curl -X PUT "http://localhost:9200/_snapshot/my_snapshot?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "gcs",
  "settings": {
    "bucket": "gcs-es-snapshots",
    "base_path": "es_snapshots"
  }
}
'

If there were no errors, you should see the acknowledgement below:

{
  "acknowledged" : true
}

There are more settings available for the GCS repository plug-in available here.

You can verify that the repo was configured correctly by running:

curl -X GET "http://localhost:9200/_snapshot?pretty=true"

{
  "my_snapshot" : {
    "type" : "gcs",
    "settings" : {
      "bucket" : "gcs-es-snapshots",
      "base_path" : "es_snapshots"
    }
  }
}

Step 4: Configure the SLM policy

With all the pieces in place, let’s create the SLM policy called 3hour-snapshots by running the command below:

curl -X PUT "http://localhost:9200/_slm/policy/3hour-snapshots?pretty" -H 'Content-Type: application/json' -d'
{
  "schedule": "0 0 */3 * * ?", 
  "name": "<snap-{now{yyyy.MM.dd.HH.mm}}>", 
  "repository": "my_snapshot", 
  "config": { 
    "indices": ["*"] 
  }
}
'

The output should be:

{
  "acknowledged" : true
}

The JSON command has the following properties:

To see your current policy, you can run:

curl -X GET "http://localhost:9200/_slm/policy?pretty"

Step 5: Manually execute our SLM

Our configured SLM will run every 3 hours, but we can force execution to test things out by running:

curl -X POST "http://localhost:9200/_slm/policy/3hour-snapshots/_execute?pretty"

On execution, you should see the name of the new snapshot:

{
  "snapshot_name" : "snap-2020.06.22.09.48-ljjv7mznqyscjzldewkngw"
}

Now let’s check our snapshot status:

curl -X GET "http://localhost:9200/_slm/policy/3hour-snapshots?human&pretty"

{
  "3hour-snapshots" : {
    "version" : 4,
    "modified_date" : "2020-06-22T08:25:03.571Z",
    "modified_date_millis" : 1592814303571,
    "policy" : {
      "name" : "<snap-{now{yyyy.MM.dd.HH.mm}}>",
      "schedule" : "0 0 */3 * * ?",
      "repository" : "my_snapshot",
      "config" : {
        "indices" : [
          "*"
        ]
      }
    },
    "last_success" : {
      "snapshot_name" : "snap-2020.06.22.09.48-ljjv7mznqyscjzldewkngw",
      "time_string" : "2020-06-22T09:48:15.353Z",
      "time" : 1592819295353
    },
    "next_execution" : "2020-06-22T12:00:00.000Z",
    "next_execution_millis" : 1592827200000,
    "stats" : {
      "policy" : "3hour-snapshots",
      "snapshots_taken" : 6,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 0,
      "snapshot_deletion_failures" : 0
    }
  }
}

To get a list of all snapshots taken against a repo, you can run:

curl -X GET "http://localhost:9200/_snapshot/my_snapshot/_all?pretty" 

Conclusion

The in-built SLM can be more useful than traditional cron jobs for managing index snapshots. With blob storage services of the major cloud providers already available, flexible snapshot schedules using cron syntax, and the ability to specify different indices, using SLM makes a lot more sense than trying to roll a custom solution using scripts/automation.

Resources