Introduction
The grok filter – and its use of patterns – is the truly powerful part of logstash. Grok allows you to turn unstructured log text into structured data.
grok
The grok filter attempts to match a field with a pattern. Think of patterns as a named regular expression. Patterns allow for increased readability and reuse. If the pattern matches, logstash can create additional fields (similar to a regex capture group).
This example takes the event’s “message” field and attempts to match it with 5 different patterns (e.g. “IP”, “WORD”). If it finds a match for the entire expression, it will add fields for the patterns (“IP” will be stored in the “client” field, etc).
filter { grok { match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ] } }
If the input doesn’t match the pattern, a tag will be added for “_grokparsefailure”. You can (and should; see best practices) customize this tag.
Patterns
Logstash ships with lots of predefined patterns. You can browse them on github.
Patterns consist of a label and a regex, e.g.:
USERNAME [a-zA-Z0-9._-]+
In your grok filter, you would refer to this as %{USERNAME}:
filter {
grok {
match => [ "message", "%{USERNAME}" ]
}
}
Patterns can contain other patterns, e.g.:
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
Target Variables
A pattern can store the matched value in a new field. Specify the field name in the grok filter:
filter {
grok {
match => [ "message", "%{USERNAME:user}" ]
}
}
If you’re using a regexp, you can make a new field with an Oniguruma trick:
filter { grok { match => [ "message", "(?<myField>[a-z]{3})" ] } }
This would find three lower case letters and create a field called ‘myField’.
Casting
By default, grok’ed fields are strings. Numeric fields (int and float) can be declared in the pattern:
filter {
grok {
match => [ "message", "%{USERNAME:user:int}" ]
}
}
Note that this is just a hint that logstash will pass along to elasticsearch when it tries to insert the event. If the field already exists in the index with a different type, this won’t change the mapping in elasticsearch until a new index is created.
Custom Patterns
While logstash ships with many patterns, you eventually will need to write a custom pattern for your application’s logs. The general strategy is to start slowly, working your way from the left of the input string, parsing one field at a time.
Your pattern does not need to match the entire event message, so you can skip leading and trailing information if you just need something from the middle.
Grok uses Oniguruma regular expressions.
Be sure to use the debugger (see below) when developing custom patterns.
Debugging
There is an online grok debugger available for building and testing patterns.