Intro
The first thing to notice is that the documents probably have different _id values, so the problem then becomes, “who is inserting duplicates??”.
If you’re running logstash, some things to look at include:
- duplicate input{} stanzas
- duplicate output{} stanzas
- two logstash processes running
- bad file glob patterns
- bad broker configuration
Duplicate Stanzas
Most people aren’t silly enough to deliberately create duplicate input or output stanzas, but there are still easy ways for them to occur:
- a logstash config file you’ve forgotten (00-mytest.conf)
- a backup file (00-input.conf.bak)
Remember that logstash will read in all the files it finds in your configuration directory!
Multiple Processes
Sometimes your shutdown script may not work, leaving you with two copies of your shipper running. Check it with ‘ps’ and kill off the older one.
File Globs
If your file glob pattern is fairly open (e.g. “*”), you might be picking up files that have been rotated (“foo.log” and “foo.log.00”).
Logstash-forwarder sets a ‘file’ field that you can check in this case.
If you’ve enabled _timestamp in elasticsearch, it will show you when each of the duplicates was indexed, which might give you a clue.
Brokers
As for brokers, if you have multiple logstash indexers trying to read from the same broker without some locking mechanism, it might cause problems.