{"id":145,"date":"2015-10-22T22:36:53","date_gmt":"2015-10-22T22:36:53","guid":{"rendered":"http:\/\/svops.com\/blog\/?p=145"},"modified":"2015-10-22T22:36:53","modified_gmt":"2015-10-22T22:36:53","slug":"duplicated-elasticsearch-documents","status":"publish","type":"post","link":"http:\/\/svops.com\/blog\/duplicated-elasticsearch-documents\/","title":{"rendered":"Duplicated elasticsearch documents"},"content":{"rendered":"<h2>Intro<\/h2>\n<p>The first thing to notice is that the documents probably have different _id values, so the problem then becomes, &#8220;who is inserting duplicates??&#8221;.<\/p>\n<p>If you&#8217;re running logstash, some things to look at include:<\/p>\n<ul>\n<li>duplicate input{} stanzas<\/li>\n<li>duplicate output{} stanzas<\/li>\n<li>two logstash processes running<\/li>\n<li>bad file glob patterns<\/li>\n<li>bad broker configuration<\/li>\n<\/ul>\n<h2>Duplicate Stanzas<\/h2>\n<p>Most people aren&#8217;t silly enough to deliberately create duplicate input or output stanzas, but there are still easy ways for them to occur:<\/p>\n<ul>\n<li>a logstash config file you&#8217;ve forgotten (00-mytest.conf)<\/li>\n<li>a backup file (00-input.conf.bak)<\/li>\n<\/ul>\n<p>Remember that logstash will read in all the files it finds in your configuration directory!<\/p>\n<h2>Multiple Processes<\/h2>\n<p>Sometimes your shutdown script may not work, leaving you with two copies of your shipper running. \u00a0Check it with &#8216;ps&#8217; and kill off the older one.<\/p>\n<h2>File Globs<\/h2>\n<p>If your file glob pattern is fairly open (e.g. &#8220;*&#8221;), you might be picking up files that have been rotated (&#8220;foo.log&#8221; and &#8220;foo.log.00&#8221;).<\/p>\n<p>Logstash-forwarder sets a &#8216;file&#8217; field that you can check in this case.<\/p>\n<p>If you&#8217;ve enabled _timestamp in elasticsearch, it will show you when each of the duplicates was indexed, which might give you a clue.<\/p>\n<h2>Brokers<\/h2>\n<p>As for brokers, if you have multiple logstash indexers trying to read from the same broker without some locking mechanism, it might cause problems.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Intro The first thing to notice is that the documents probably have different _id values, so the problem then becomes, &#8220;who is inserting duplicates??&#8221;. If you&#8217;re running logstash, some things to look at include: duplicate input{} stanzas duplicate output{} stanzas &hellip; <a href=\"http:\/\/svops.com\/blog\/duplicated-elasticsearch-documents\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts\/145"}],"collection":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/comments?post=145"}],"version-history":[{"count":1,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts\/145\/revisions"}],"predecessor-version":[{"id":146,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts\/145\/revisions\/146"}],"wp:attachment":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/media?parent=145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/categories?post=145"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/tags?post=145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}