Measuring Results

A couple of us on the UI team were having a conversation about the comments to

yesterday's post.  Someone pointed out that it appears we've discovered
a community on the internet of people just as obsessed about UI as we are around
here. :)

I wanted to build on yesterday's conversation by helping you to understand more about how
we measure results.

When you boil it down, we have a pretty straightforward set of high-level
goals for the Office 12 UI redesign.  Help people use more of the
functionality in Office.  Help people create better-looking, richer
documents.  Save people time in doing the tasks they frequently do. 
Make sure people can be productive right away using the new version.  Help
normal people get results that only power users could get before.  Give
power users the a richer set of powerful tools to go beyond what was possible
before.  Stuff like that.

So, how do you know when a feature is right?  There are so many tools at
our disposal.

The first line of defense is the people who work on the product.  I install a daily build of Office
literally every day.  As soon as a new piece of functionality is in, you
have to try it out and get that visceral feeling about "does it feel right?" 
Sometimes, no matter how good your spec seemed or your prototypes were, the
first time you play with it you know it feels wrong.  You never get a
second chance to form that first, instant impression of "is it right or not" and
I put a lot of value in my initial impression.

Of course, there are way more people than me using interim builds.  And
included in that set are the crustiest elite power-users in the world: Microsoft
employees.  You would think that Microsoft employees would be open to
change and sympathetic about "work in progress", but some of the most advanced
Office users in the world work here, and if you get in the way of their
productivity, you're going to hear about it.  These people represent the
upper crust of people who use a cross-section of just about every feature in
Office.  So they provide a constant stream of opinions representing people
who are already experts in using previous versions of Office.

We do usability tests, as I've talked about many times.  The test subjects
range literally from people who have no experience with productivity software at
all to experts who make a living writing Office add-ins.  We do tests in
Redmond, of course, but we also do remote testing in sites all around the United
States and in our labs in Europe and Asia.

A common misconception is that a usability test is all about data--that we
receive a 100-page report full of graphs and tables and we average the data and
it makes design decisions for us.

It's not actually the raw data though that makes usability so compelling. 
Most of the time, it's the "a-ha" moment you have in watching someone with a
different background and way of thinking from you use the software.  Often
times, within the first 5 minutes you can see that you've failed and you don't
need a sheet of data to tell you that.  It's a humbling experience to sit
and watch people struggle.  And your job as a PM is to figure out
why and to fix it.

Yesterday's story about "Eat Dismiss Clicks" is an
example of this.  One can argue the theoretical implications of focus
issues until the cows come home, but watching people all around the world, of
all different skill levels, fail again and again in the same way tells you the
design is wrong.  When you repair the design and then see the same
diversity of people succeed at the same tasks, you know you've done the right
thing.  It's not about some computer spitting out data, it's about watching
the experiences people have interacting with the software.  Watching their
faces, hearing what they have to say.

Data from usability comes into play when answering questions like "what features
are the hardest to find" or "where does it take people longer to do something
than it used to."  For instance, we can benchmark how long it takes people
to make particular kinds of documents and see where people do great and where
they struggle.  We can then take a more in-depth look to see why
people are struggling in certain areas and make improvements.

The next source of feedback are all of the many people who have been using the
product over the last six months.  We have MVPs and a program of technical
professionals who have been giving us feedback during all this time.  For
the last three months, we've been receiving feedback from thousands of beta
testers.  We have rollouts of Office 12 in businesses in three continents,
having people use it every day to get their job done and telling us what works and what
doesn't.  Having thousands of vocal Office users providing a constant
stream of feedback gives you a good idea of
what people like and where parts of the product need some more thought or some
more work.

There's all of you here on the UI blog as well, and people writing about Office
12 all over the internet.  We read the things you write, noting what you
like and the questions you have.

And of course, the last piece--yes, we do have data.  Through the
Customer
Experience Improvement Program
, we can look at aggregated statistics about what
features people use, how they use them (keyboard, mouse), when they tab switch
in the Ribbon, and a lot of other things.  This provides a general
"heartbeat" of how the overall project is doing and complements all of the
anecdotal feedback we get from people using the product.

So, how do we measure results against our high-level goals?  We have to
synthesize all of those inputs.  Yes, there is a ton of data.  We get
bushels of anecdotal opinions.  We talk to partners and beta users. 
We watch people use the software here and in their place of work.  We watch
our parents use it.  We watch our children use it.  We talk to people
at the grocery store, or on the airplane.  We look at what you write here. 
And we try to stay true to our design tenets around simplicity, efficiency,
predictability, respect for screen real-estate...

All of these things combine to provide the true picture of how we measure success.