Combining lines with the multiline{} filter

The multiline filter is designed to combine messages that span lines into a single event that can be easily processed with other logstash filters.  Systems that throw large exceptions (e.g. Java) are the standard use-case for this filter.

At the most basic, you need to provide three pieces of information to the filter:

  • ‘pattern’: the regular expression that signals the start of a new event.
  • ‘what’: the action to take with a line that does or doesn’t match the pattern.
  • ‘negate’: how the does/doesn’t for ‘what’ is decided.

When ‘negate’ is set to true, read it as “when my PATTERN doesn’t match, do WHAT”; when false, read it as “when my PATTERN does match, do WHAT”.

In this example, ‘negate’ is true, so we read it as “when my timestamp pattern doesn’t match, keep the line with the previous entry”:

filter {
    multiline {
      negate => 'true'
      pattern => "^%{TIMESTAMP_ISO8601} "
      what => 'previous'
    }
}

This filter should be used first, so that other filters will see the single event.

Until a new line matches the pattern, logstash is expecting more lines to join, so it won’t release the combined event.  There is an enable_flush option, but it should not be used in production.  In logstash version 1.5, the flush will be “production ready”.

When using multiline, you cannot use multiple filter workers, as each worker would be reading a different line.  If you attempt this configuration, logstash will not start.

If your application writes log entries in a way where they can overlap with each other, the basic filter can’t help you.  However, if your system prints a common string in each message (a UUID, etc), you can use that to combine messages.  See the ‘stream_identity’ option.

You should also consider using the multiline{} codec, so that messages are combined in the input{} phase.  Note that the codec doesn’t offer the ‘stream_identity’ option.

 

2 responses to “Combining lines with the multiline{} filter

  1. Rajesh Swarnkar

    Hello SVOps,

    I am trying to parse logs like:
    INFO 2015/10/10 01:00:23.247 32254 Some Greedy Texts
    DEBUG 2015/10/10 01:00:23.248 32254 Some greedy texts
    DEBUG 2015/10/10 01:00:23.382 32254 name=[value].
    name=[value].
    name=[value].

    Here INFO/DEBUG sometimes are multiline. I tried using multiline plugin, but that is not thread safe. So I cannot use it for production.

    I tried using codec multiline plugin something like:

    input
    {
    file
    {
    type => “cpplogs”
    path => “D:/testingpath/ES/logs/enginelog.txt”
    start_position => “beginning”
    codec => multiline
    {
    patterns_dir => “D:/testingpath/ES/logstash/bin/patterns”
    pattern => “^%{LOGGINGLEVEL}\s+”
    negate => true
    what => previous
    }
    }
    }
    But I observed this was not flushing the last line until the grok finds the %{LOGGINGLEVEL}\s+ pattern in NEXT line beginning. This is not good behaviour for production, Isn’t it?

    How do I get around this problem?

    • The multiline filter has a max_age feature that will flush an event instead of waiting for the next event. I don’t think this is available in the multiline codec.

Leave a Reply

Your email address will not be published. Required fields are marked *