{"id":36,"date":"2015-01-13T22:13:41","date_gmt":"2015-01-13T22:13:41","guid":{"rendered":"http:\/\/svops.com\/blog\/?p=36"},"modified":"2015-01-14T18:18:58","modified_gmt":"2015-01-14T18:18:58","slug":"monitoring-your-log-files","status":"publish","type":"post","link":"http:\/\/svops.com\/blog\/monitoring-your-log-files\/","title":{"rendered":"Monitoring your log files"},"content":{"rendered":"<h1>Overview<\/h1>\n<p>If you&#8217;ve setup your ELK cluster and logs are flowing in from your shippers, you&#8217;re now sitting on a goldmine of data. \u00a0The question becomes, &#8220;what should I do?!??&#8221;<\/p>\n<p>A first step is to make Kibana dashboards, but they serve little value in a lights-out environment (see\u00a0<a title=\"Monitoring and Alerting\" href=\"http:\/\/svops.com\/blog\/?p=11\">http:\/\/svops.com\/blog\/?p=11<\/a>).<\/p>\n<p>When you&#8217;re ready to actively monitor the information that&#8217;s sitting in the cluster, you&#8217;ll want to pull it into your monitoring system (Nagios, Zabbix, ScienceLogic, whatever).<\/p>\n<p>There are many benefits to this approach over Logstash&#8217;s build-in notifications, including:<\/p>\n<ul>\n<li>one alerting system (common message format, distribution groups, etc).<\/li>\n<li>one escalation system (*)<\/li>\n<li>one acknowledgement system (*)<\/li>\n<li>one dashboard for monitoring<\/li>\n<\/ul>\n<p>(*) Logstash doesn&#8217;t provide these features.<\/p>\n<p>This system is also better than using Logstash&#8217;s nagios-related plugins, since you&#8217;ll be querying all the documents in Elasticsearch, not just one document at a time. \u00a0You&#8217;ll also be using Elasticsearch as a database, rather than using Logstash&#8217;s metric{} functionality as a poor substitute.<\/p>\n<p>There are two systems that you should build. \u00a0I&#8217;ll reference Nagios as the target platform.<\/p>\n<h1>Individual Metrics<\/h1>\n<p>If you wanted to query Elasticsearch for the total number of Java exceptions that have occurred, this is a good individual metric.<\/p>\n<p>In Nagios, you would first define a virtual host (e.g. &#8220;elasticsearch&#8221;, &#8220;java&#8221;, &#8220;my_app&#8221;, etc) and a virtual service (e.g. &#8220;java exceptions&#8221;). \u00a0The service would run a new command (e.g. &#8220;run_es_query&#8221;). \u00a0Set the check interval to something that makes sense for your organization.<\/p>\n<p>The magic comes in writing the underlying program that is run by the &#8220;run_es_query&#8221; command. \u00a0This program should take a valid Elasticsearch query_string as a parameter, and run it against the cluster.<\/p>\n<p>In the Nagios world, the script has to return the values to show OK, WARNING, etc. \u00a0The output of the script can also include performance data, which is\u00a0used for charting.<\/p>\n<p>The <a href=\"http:\/\/www.elasticsearch.org\/guide\/en\/elasticsearch\/client\/python-api\/current\/\">python elasticsearch module<\/a> makes writing the script pretty easy. \u00a0Write one script for each query type (max, count, most recent document, etc); this will help keep your code from becoming unreadable due to being so generic.<\/p>\n<h1>Bulk Metrics<\/h1>\n<p>If you wanted to count the Java exceptions, but report them on a machine-by-machine basis, you would not want to launch the &#8220;individual metric&#8221; command for a set of physical hosts. \u00a0Doing this would result in many queries being run against Elasticsearch, and doesn&#8217;t scale well at all.<\/p>\n<p>The better alternative is to run one &#8220;bulk&#8221; script that pulls the data for all hosts from Elasticsearch and then passes that information to Nagios using the &#8220;<a href=\"http:\/\/nagios.sourceforge.net\/docs\/nagioscore\/4\/en\/passivechecks.html\">passive check<\/a>&#8221; system. \u00a0Nagios will react to the information as configured.<\/p>\n<h1>\u00a0Where&#8217;s the Code?<\/h1>\n<p>I&#8217;ve written this plugin a few times for different platforms, but always as (unsharable) work-for-hire. \u00a0I hope to rewrite this in my spare time some day, but this outline should get you started.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview If you&#8217;ve setup your ELK cluster and logs are flowing in from your shippers, you&#8217;re now sitting on a goldmine of data. \u00a0The question becomes, &#8220;what should I do?!??&#8221; A first step is to make Kibana dashboards, but they &hellip; <a href=\"http:\/\/svops.com\/blog\/monitoring-your-log-files\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,3,6],"tags":[],"_links":{"self":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts\/36"}],"collection":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/comments?post=36"}],"version-history":[{"count":5,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts\/36\/revisions"}],"predecessor-version":[{"id":42,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/posts\/36\/revisions\/42"}],"wp:attachment":[{"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/media?parent=36"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/categories?post=36"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/svops.com\/blog\/wp-json\/wp\/v2\/tags?post=36"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}