Monthly Archives: November 2013

Monitoring using the New Relic API

A client has some code that is instrumented with a New Relic agent. We wanted to track the performance of individual portions of the code – mostly dependencies on other services like databases and third-party data sources. Rather than have yet another alerting platform, we wanted to pull the information into Nagios. Fortunately, New Relic offers an API that’s pretty easy to use.

The first step is to enable API access in your New Relic account and get the API key. According to the NR doc, the steps are:

  1. Sign in to the New Relic user interface.
  2. Select (account name) > Account settings > Integrations > Data sharing > API access.
  3. Click Enable API Access, and then copy or make a note of your API key.

Once you have the API key, the first request you’ll want to make is to get your account_id. Try this:

curl -gH "x-api-key:YOUR_API_KEY" ''

Note that the account_id also appears in the page urls when you’re logged in to the new relic website.

With that done, there are only a handful of URLs that you might want to hit. New Relic breaks things down by application, so you’ll need a list of those:

curl -gH "x-api-key:YOUR_API_KEY" ''

The results of that call will not only give you the IDs for each application, but also links to the Overview and Servers pages. Again note that the application ID appears in the page urls when you’re on the new relic website.

Grab a list of the metrics that are available for the application:

curl -gH "x-api-key:YOUR_API_KEY" ''

And finally pull the a statistic for that metric:

curl -gH "x-api-key:YOUR_API_KEY" '[]=YOUR_METRIC_NAME&field=call_count&begin=2013-11-14T00:00:00&end=2013-11-14T23:59:59&summary=1'

This will return xml-formatted data for that metric for a single day. With “summary=1”, you get only one row returned. To get smaller buckets throughout the day, leave “summary” off.

In a quick scan, we didn’t find a way to get more than one metric value per call, so we make multiple calls to get what we need

Note that you can use “data.json” or “data.csv” to have the data returned in different formats. We used xml during manual development and then switched to json when we started writing the nagios plugin.

We now use this plugin check over 50 metrics every three minutes for the client, pulling the average_response_time, max_response_time, and call_count.

Monitoring and Alerting

Monitoring is a good thing (duh!).  It’s one of the core functions that an operations group provides.  As with most things, there are good ways and bad ways of doing monitoring.

When you monitor, you end up with some kind of dashboard, say with Nagios:



Which is very helpful, when you’re looking at it.

Ever seen a setup like this?


I think people who build systems like this really liked the movie War Games, but have missed one important difference – in the movie, it was someone’s job to sit and watch the screens 24/7/365.  Do you have staff for that?  Should we treat people that way?

In the real world, people have better things to do than stare at a monitor, waiting for some indicator to turn red.  Large displays like this become “monitoring theater” (see Security Theater) – basically fluff to make people think the system is being monitored. But, with those monitors sitting there, what happens when nobody is looking?


You must have alerting to make your monitoring worthwhile. Do you?


Welcome to!

Right now, this is just a blog, which draws on the experience of several Silicon Valley operations folks with a wide range of talents and opinions.

Please note that items that are being discussed may not relate to existing clients, but could be generic thoughts on the topic or relate to previous clients.

If something catches your eye, please leave a comment.