Things I learned about Performance Investigation : My talk at Perf@Scale conference

Last Wednesday, I gave a talk at the one day Perf@Scale conference entitled 'Keys to Actionable Performance Investigations'. 

This talk is meant to tell people what I learned over my 10 years of doing performance investigations, and in particular what of that 'transfers' to any investigation.

My main take-aways are

  1. You need detailed data (don't guess around with top level counters etc), and that means STACKS.   And you need this to be where the perf problem is (e.g. typically in production), so you need LOW OVERHEAD IN Production sampling.   ETW does this on the windows platform and PerfEvents do this on Linux.  (Thus you can collect with PerfView on windows or as explained in my blog on Linux). 
  2. Once you have achieved (1) you now have so much data you need a way of looking at SOME of it in high detail (your own code), and hiding detail in most of it that you don't care about (OS and framework code etc).   I argue that the operators that PerfView has for grouping, folding and filtering are really good at that.   I challenged people to see for themselves, but collecting data that is relevant to them (on windows, Linux or other), and potentially using perfView's ability to read external data, no mater where it comes to at least 'try it out' and convince yourself that indeed, these operators are useful and worth incorporating into pretty much any performance tool. 
  3. Because services typically have dozens of independent things going on simultaneously, you need extra events that mark the start and end of individual requests so that you can group all the activity from one request together.   Thus you need more than just CPU stacks, you need 'markers' that tell you want is important (things on your critical path). 
  4. That STACKS are super important, that if you lose them you have to 'fix' that.    Exactly this happens with async code.  Asynchronous techniques splits up what would have been one logic thread of execution into dozens of 'bits' that are executed in small chunks on whatever thread happens to be available.   This means that the stacks of thread not longer a really useful 'trail' of the execution of your program.   You fix this by adding more events to you async library so that you can track which chunks 'cause' other chunks, and you can 'stitch' the causality back together again.   It is not a 'call stack' any more but it is a 'causality stack' and that is basically what you want. 

The link above will show you all the talks that day including mine.   I was limited to 35 min, and so the talk was mostly to inspire you to learn more.

Well here is where you can learn more. 

Attached below is a (120 Meg) file called '' that contains

  • The slides for the talk
  • The complete source code for demos that are not already included in PerfView itself in particular
    • SyncTutorial  - is a one page trivial program that simulates a server running many things in parallel
    • AsyncTutorial - Also simulates a server, but uses async techniques in its implementation. 
  • The ', the and files that I looked at during the presentation (thus you can open them up without having to build anything simply download PerfView and open them. 
  • A file that shows you data that was collected on Linux.  You can open this file in PerfView as well to see what Linux data looks like. 

So for those who are inspired, this blog entry lets you 'replicate' what I did in the talk yourself and understand it in a way that you can't possibly get by simply watching a 35 min video.