Software Performance for Metro Style Applications

With the Windows Consumer Preview out the door, I thought it would be interesting to write something about creating great performing Windows applications.  I hope to have a lot more to say about this in the future but I think really the most important things I have to say are more inspirational rather than informational at this point. The key point is that things are quite a bit different than they were, and how you think about creating an application with great performance is perhaps different than it ever has been.

OK that’s a mouthful, what am I talking about?

Well one thing is the operating system itself is a lot more frugal. Improvements in Win7 and continuing improvements in Win8 mean you get more of your system resources available for your applications. Great news!  And what are those applications going to do with those resources?  Why use them of course!  Hopefully wisely, to create great experiences when they are active and economically fade to nothing when they are not.  As we’ll see, resource usage is more temporal in Win8 and so management is essential.

Back when I worked on the Whidbey CLR we thought a modest forms-based application ought to be able to start up in something like 50ms and use something roughly like 5MB of working set.

In many ways we considered the working set number to be too big. We addressed that at least in part by making as much of that memory shared as possible – good for amortizing the cost over many processes.  Big fractions of the private memory came from things like the GC heap’s initial allocation, and pre-committed stack pages plus other fixed costs but much of it had to do the raw cost of getting a CLR application going.  Still that’s where we were:  50ms and maybe 5MB, fighting to go down from there.

Fast forward to 2012 and things look a lot different.  A metro style application might spend say 9 frames @60 Hz fading in a splash screen (call it 150ms) and then another 9 fading it out.  And then maybe as many doing some cool slide-in type effect as the title comes in from one side and the data from the other.  These kinds of effects are part of the Metro look and feel.  Now at that point we haven’t actually done anything per se and we’ve spent maybe 450ms just on animations.

Memory wise, to do those transitions we probably have spent private memory to the tune of 3 full screen surfaces.  Easily 4MB each (at 1366x768) with probably 2 live at any given moment and that’s just the memory for the video.  Clearly the world is a different place.  And already the temporal nature of resource usage is showing clearly, those costs are short term.

So what does it mean?

This metro style application experience, despite the fact that it takes longer to start up, is actually much more pleasing to use.  It responded instantly to our command and fit nicely into the expected design.  I think that’s the most important observation: in a metro style application we do get some more latitude to use video memory especially for effects but the price we pay is that we have even less latitude for stalls.  People expect a smooth responsive system at all stages.  No video glitches, no jerkiness, no synchronous pauses to get resources, no long wait-cursors.  The expectation has been set from the first frame that your experience will be a lovely one and applications that don’t deliver will look like old klunkers.

Let’s also consider where this extra memory came from: video memory is in many ways the cheapest in terms of time-to-get-ready.  In previous frameworks we were dealing with mostly code that had to be brought in and working set was a good proxy to startup time because working set tends to translate directly to disk IO.  Clean video surfaces don’t have to be paged in so that (nearly) 1:1 relationship between working set and startup time is ancient history.  Which isn’t to say you can now load all the code you like: you can’t, especially not all at once because page faults turn into glitches.  Like all your other resources you’ll want to get them loaded asynchronously and have a good UI experience while they are coming in.  That means don’t load your whole program at startup, don’t initialize everything you might ever need, and create suitable transitions and incremental experiences for data and program parts arriving.  In a great application all of that is part of the overall experience.

This is why I think it’s even harder on the programmer to get all these things right and why it’s even more important than ever for the frameworks to give you a Pit of Success experience when you use the normal templates and the normal services to get your job done.   If you aren’t careful then you’ll destroy the intended experience and even “a little bit broken” will look terrible in contrast to the expectation set by the design.

Compounding all of this is our desire to run metro style applications on lower end hardware.  Some of these systems are equipped with decent GPUs (though not the greatest)  but their CPUs offer a raw compute speed that might be say 1/6th (on a per core basis) of a high end desktop.  If you want your application to run well in that environment you’ll need to think carefully about how much work you do in one transition and how you use asynchronous patterns to keep you application feeling crisp even with CPU resources that are less abundant.   Careful testing and measuring on these lower end systems will be essential because your normal development machine could delude you with speed you don’t have.

And as if that’s not enough, almost universally power consumption is a huge concern.  In unplugged scenarios, whether it’s laptop or tablet every joule consumed matters.  In some cases a core feature of the tablet is the excellent battery life and hence the lower power CPU and so forth.  As an application or framework author you must consider these things very carefully.  The system will want to suspend your application when it isn’t running – how will this affect the way you store your state?  You will want to have lovely animations while your application is active because that’s what a great application does but you’ll want to be able to turn those off when you aren’t visible.  Background activity generally should be looked at with great suspicion because it will drain power.  What background work must happen?  What could be reduced or suspended entirely?    Great choices will mean superior battery life which could be a huge aspect of your success.

Despite these challenges I’m optimistic.  I think the Metro design is a great one because has a lovely look while at the same time it is not so terribly hard to create that you must spend all resources to make it.  The Metro look and feel is largely composed of things that GPUs do comparatively well.  That’s good news for everyone and it’s a good basis to create your applications.  Asynchronous design patterns are possible in all the languages WinRT supports and especially in Javascript, which is likely to be the most popular choice by volume. Asynchronous patterns are already normal for most javascript libraries.

Bottom line: what constitutes a great performing windows application has changed in the metro world,  but great performance is still important, perhaps more important than ever.

As I said above, I hope to write more about these concerns in the coming weeks but for now I'd like to refer you to some articles that have already been written on MSDN.  It's not everything by a longshot but its something.  While this particular work discusses the problems in the context of javascript, most of the advice is actually broadly applicable no matter what framework you use.   Check it out the guidance here.