Analyzing your App Insights Data with the ELK stack

Application Insights is a telemetry and instrumentation offering that gives you a great out-of-box experience for understanding what is happening with your sites, apps, and mobile apps.  The data viewing experience in the new Azure portal extends from high level summary charts of your usage all the way down to viewing individual diagnostic events.  The team is adding new capabilities to the portal experience at a rapid cadence, and for most users, most of the time, the portal is the best place to quickly understand what's going on with your code.

One of the great new features the App Insights team has delivered is the ability to export your telemetry data.  Enabling and configuring this is described here:

https://azure.microsoft.com/en-us/documentation/articles/app-insights-export-telemetry/

Once you can get the raw data out of App Insights, you can import into other analysis or visualization tools.  Today I'll talk a little bit about how you might want to do that with the ELK stack.  ELK refers to "Elastic Search, Logstash, Kibana", and is a suite of open source technologies designed to build rapid data visualizations of semi-structured text data.  ElasticSearch is the indexing system, logstash is an agent for pumping data into elasticsearch, and kibana is a dashboard building tool that works well with ElasticSearch.

For this posting, I'll be talking about the "custom event" event type that AI exposes.  A custom event allows you to manually insert telemetry points into your code base.  You choose the event name, and you can attach properties or measures to the event as necessary.  Your mileage may vary if you work with other event types.

AppInsights custom events are exported as Azure storage blobs into the storage account you configure when you enable exporting.  The blob files themselves are organized by application, date, and hour.  Within each per-hour directory,  one or more blob files with semi-random names will be present.  Each blob file will have 1 or more events.  Each event will be a separate JSON object, one event per line of text.

If you'd like to do a one-time download of all of your blobs, doing a recursive copy with AzCopy.exe is a great way to get the data onto a local machine.  A more elegant solution might be to have a worker role that monitors the storage accounts for new blobs, and retrieves them as needed.

Once you've got a blob, you need to insert its contents into ElasticSearch.  You could adapt the code in the article above and have it use ElasticSearch.NET to insert each line of JSON into your ES instance.

Consider the following code:

 IElasticsearchClient client = new ElasticsearchClient();
 
 string indexName = "myevents";
 string typeName = "aievent";
 
 foreach(var blob in directoryFullOfBlobs)
 {
 string s = blob.ReadAllText();
 string lowerCaseEventText = "";
 foreach (string eventString in s.Split(new[] { '\r', '\n' }))
 {
 // eventString is a JSON object representing 1 event..
 if (eventString != "")
 {
 lowerCaseEventText = eventString.ToLower();
 client.Index(indexName, typeName, lowerCaseEventText);
 }
 }
 }
 

(Warning: Don't run this code and insert your events into ES just yet. Read the whole post to see why)

Assuming that you have some enumerable of blob files, this code would read them, split each file up into separate events, transform the entire event into lower case, and then insert the event ``body into the default ES server, using index "myevents" and object type "aievent".

(The case transformation is optional - in our environment, we've seen some case inconsistencies which break certain reports, so I normalize the JSON to all lower case)

Note that we're not using the "L" part of the ELK stack here - you could probably create a Logstash plugin that does the event insertion instead of writing custom code. If you come up with a good one, please let me know.

At this point, your AI events should start showing up in kibana. However, if you start looking at the data, you may notice some things aren't showing up quite the way you like. For one thing, the locale field of your users is normally in the form of "languagecode-countrycode", with examples being en-US, fr-FR, or en-GB. The default string tokenizer in ES splits up this field on the dash character. Kibana draws you a pie chart with values like "en", "GB", and "fr". This is not what you want.

Similarly, if your event names have any sort of separator characters in them, they'll be tokenized into segments.

Finally, when we inserted these events, we didn't specify an ID for them, so ES auto-generated an id for each event we inserted.  This is nice, but it means that if we re-insert the same event more than once, it will be duplicated in our ES index and subsequently in any reports we build with kibana.

app insights events already have a durable unique ID, so it would be nice if we could tell ES to simply honor the id that's already present.   

It turns out, we can fix all three of these problems by adding an ES mapping to our index before we insert any data into it.  The mapping I present here is one that I've been happy with; you can use it as a starting point for your own ES indexes.  The mapping does 3 things. 

1) it maps the ai internal event ID to be the ES document ID. 

2) it adds a 2nd property for the event name that won't be tokenized.  This lets you report on the full event name

3) it adds a 2nd property for the locale name that won't be tokenized.  This lets you report on the full locale value

 

To create this mapping, first create the index

curl -XPUT https://localhost:9200/indexname/

the inject the mapping:

curl -XPOST https://localhost:9200/indexname/aievent/_mapping 

 {
 "aievent": {
 "_id": {
 "path": "internal.data.id"
 },
 "properties": {
 "context": {
 "properties": {
 "device": {
 "properties": {
 "locale": {
 "type": "multi_field",
 "fields": {
 "locale": { "type": "string" },
 "originallocale": { "type": "string", "index": "not_analyzed" }
 }
 }
 }
 }
 }
 },
 "event": {
 "properties": {
 "name": {
 "type": "multi_field",
 "fields": {
 "name": { "type": "string" },
 "originalname": { "type": "string", "index": "not_analyzed" }
 }
 }
 }
 }
 }
 }
 }
 

This mapping adds "event.originalname" and "locale.originallocale" to the list of properties you can visualize in Kibana.

Note that you can add this mapping after you've inserted your data, but the ID assignment and additional fields will only show up for data you insert after the mapping is created.  It is best to apply the mapping to an empty index and then insert your AI data afterwards.

If you're interested in seeing your AI telemetry data in kibana, and building custom visualizations above and beyond what the AI portal supports today, hopefully this posting gives you a headstart.