Elasticsearch mappings and templates

Overview

In the relational database world, you create tables to store similar items.  In Elasticsearch, the equivalent of the table is a type.

You eventually get around to defining the properties of each field, be they char, varchar, auto-incrementing unsigned integer, decimal, etc. Elasticsearch is no different, except they call these mappings.

Mappings

Mappings tell Elasticsearch how to deal with your field:

  • what type of data does it contain?
  • should the data be indexed?
  • should it be tokenized (and how)?

If you just blindly throw data at Elasticsearch, it will apply defaults based on the first value it sees.  A value of “foo” would indicate a string; 1.01 would indicate a decimal, etc.

A major problem comes when the value is not indicative of the type.  What if your first string value contained “2015-04-01”?  Elasticsearch thinks that is a date, so your next value of “foo” is now invalid.  The same with basic numbers – if the first value is 1, the type is now integer, and the next value of 1.01 is now invalid.

The best way to deal with this is to create your own mapping, where you explicitly define the types of each field.   Here’s a sample:

$ curl -XPUT 'http://localhost:9200/my_index/_mapping/my_type' -d '
{
  "my_type" : {
    "properties" : {
      "my_field" : {"type" : "string", "store" : true }
    }
  }
}
'

Defined as a string, a value of “2015-04-01” in my_field would not be interpreted as a date.

Nested fields are described as nested properties.  “address.city” could be mapped like this:

{
  "my_type" : {
    "properties" : {
      "address" : {
        "properties" : {
          "city" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

There are a lot of properties that can be specified for a given field.  The Core Types page lists them.

Two of the more important ones are:

  • “index”: “not_analyzed”, which keeps Elasticsearch from tokenizing your value, which is especially useful for log data.
  • “doc_values”: true, which can help with memory usage as described in the doc.

If you use a new index every day, you would need to apply the mapping every day when the index was created.  Or, you can use templates.

Templates

Templates define settings and mappings that will be used when a new index is created.  They are especially useful if you create daily indexes (e.g. from logstash) or you have dynamic field names.

In this example, an index created whose name matches the pattern “my_*” will have its “my_field” field mapped as a string.

curl -XPUT localhost:9200/_template/my_template -d '
{
  "template" : "my_*",
  "mappings" : {
    "my_type" : {
      "my_field" : { "type" : "string" }
    }
  }
}
'

Note that the template name is global to the cluster, so don’t try to create a “fancy_template” on more than one index.

Templates still requires you to know the names of the fields in advance, though.

Dynamic Templates

A dynamic template lets you tell Elasticsearch what to do with any field that matches (or doesn’t match)  the definition, which can include:

  • name, including wildcards or partial path matches
  • type

This dynamic template will take any string and make it not_analyzed and use doc_values:

PUT /my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        { "my_dtemplate": { 
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "analyzer": "not_analyzed",
              "doc_values": true
            }
        }}
      ]
    }
  }
}

Or force any nested field that ends in “counter” to be an integer:

PUT /my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        { "my_dtemplate": {
            "path_match": "*.counter", 
            "mapping": {
              "type": "integer"
            }
        }
      ]
    }
  }
}

Logstash

One of the first things that early logstash users discovered was that Elasticsearch is a text search engine, not a log search engine.  If you gave it a string field, like:

"logfile": "/var/log/httpd/access_log"

Elasticsearch would tokenize it and index the tokens:

"logfile": ["var", "log", "httpd", "access_log"]

which make it impossible to search on or display the original value.

To alleviate this initial frustration, logstash was shipped with a default mapping that included a “raw” field for every string, set as not_analyzed.  Accessing logfile.raw would return you back the original, un-tokenized string.

This is a great work-around, and helped many logstash users not be immediately frustrated with the product, but it’s not the right solution.  Setting up your own mapping, and treating the fields as you know they should be treated, is the right solution.

Note that the extra “raw” field will be going away in a future release of logstash.

Using the Wrong Mapping

If you try to insert a document whose field types don’t match the mapping, Elasticsearch may try to help.  If possible, it will try to “coerce” (cast) the data from one type to another (“int to string”, etc).  Elasticsearch will even try “string to int” which will work for “2.0”, but not “hello”.  Check the value of the index.mapping.coerce parameter and any messages in the Elasticsearch logs.

Updating a Template

If you’re using logstash, it ships with a default template called “logstash”.  To make changes to this template, first pull it:

curl -XGET 'http://localhost:9200/_template/logstash?pretty' > /tmp/logstash.template

First, edit the file to remove the outside structure – the part that looks like this:

{
 "logstash" :

and the matching } at the end of the file.

Then, edit the file as desired (yes, that’s the tricky part!).

While you’re there, notice this line, which we’ll reference below.

"template" : "logstash-*"

Finally, post the template back into Elasticsearch:

curl -XPUT 'http://localhost:9200/_template/logstash' -d@logstash.template

Now, any index that is created after this with a name that matches the “template” value shown above will use this new template when creating the field mappings.

Testing your Template

Field mappings are set when a field is defined in the index.  They cannot be changed without reindexing all of the data.

If you use daily indexes, your next index will be created with the new mapping.  Rather than wait for that, you can test the template by manually creating a new index that also matches the pattern.

For example, if your template pattern was “logstash-*”, this will match the standard daily indexes like “logstash-2015.04.01” but will also match “logstash-test”.

Create a document by hand into that index:

$ curl -XPUT 'http://localhost:9200/my_index/my_type/1' -d '{
 "field1" : "value1",
 "field2" : 2
}'

 

5 responses to “Elasticsearch mappings and templates

  1. great article – explain pretty well

  2. Helpful article – thanks for sharing.

  3. Any thoughts on creating templates from default mappings? I’ve run into some stumbling blocks doing so. Worked a few times but not consistently. Thanks for the share.

  4. Thank you! The best explanation about mapping, spent whole day trying different things, only yours working.

Leave a Reply

Your email address will not be published. Required fields are marked *