Designing the Windows 8 touch keyboard
Starting with the earliest TabletPC enhancements to Windows, we have been working on “on-screen keyboards.” With Windows 8, we started fresh and took a "first principles" approach to developing the touch keyboard. Given the amount of experience many of us have with touch keyboards for phones, and the myriad of touch devices we interact with these days, we set a very high bar for the quality of the experience and effectiveness of input with the new Windows 8 touch keyboard. In this post, Kip Knox, a member of the Windows User Experience program management team, details this work. --Steven
When we began planning how touch and new types of PCs might work on Windows 8, we recognized the need to provide an effective method for text entry on tablets and other touch screen PCs. Since Windows XP SP1, which had Tablet PC features built in, Windows has included a touchable on-screen keyboard. But those features were designed as extensions to the desktop experience. For Windows 8, we set out to improve on that model and introduce text input support that meets people’s needs, matches our design principles, and works well with the form factors we see today and expect to see in the future.
I’m writing this blog post on our Windows 8 touch keyboard using the standard QWERTY layout in English. As I look at it, the keyboard seems very simple and sort of obvious. This comes partly from having worked on it for a while, but also because keyboards are familiar to us. But there is more here than meets the eye (or, fingertips).
We started planning this feature area with no preconceived notions. As we do with all our features, we began the text input design project with a set of principles or goals. On a Windows 8 PC using touch, we want people to be able to:
- Enter text quickly, reasonably close to the speed with which they type on a physical keyboard
- Avoid errors, and be able to easily correct mistakes
- Enter text comfortably, in terms of posture, interaction with the device, and social setting
You might note that none of those goals explicitly assumes a keyboard. And when we started the project, we cast a broad net across possible approaches to text input. We found that of all the methods of text input we considered, none met the goals above as well as a keyboard. The majority of people are simply faster, more accurate, and more comfortable typing than they are writing any other way. Windows has highly accurate handwriting recognition in several languages, as well as advanced speech recognition, for example. But without a great touch keyboard, we were not going to be able to fulfill people’s needs and expectations for touch-screen devices running Windows. So we set out to create the best touch keyboard on any device.
Optimizing for comfort and posture
There are many ways to imagine touch keyboards on a tablet, and we sketched a lot of them—large keyboards, tiny keyboards, floating keyboards, circular keyboards, swipe keyboards. But our initial design process was grounded in research we did into the ways that people interact with tablets. Our researchers conducted an in-depth study in which they observed people “living with” tablets over a period of time. Through these observations and interviews, we saw a set of three postures that are most common among people using tablets:
- One hand holding the device, with one hand interacting with the user interface
- Two hands holding the device, with thumbs interacting
- Resting the device on table, lap, or stand, and interacting with both hands
Research into people “living with” tablets revealed three common postures.
In these postures, people felt most natural and most likely to use the tablet for longer periods of time. We’ve made many design decisions in Windows 8 to optimize for these postures, and that includes how people intuitively input text. When typing on a tablet, most people either set it on their lap or a table and multi-finger type, or hold it in their hands and type with their thumbs, or hold it with one hand and “hunt and peck.”
Our standard touch keyboard layout is optimized for laying the tablet down and multi-finger typing, and also works well for typing with one hand. We also introduced a new layout we call the thumb keyboard (which we showed for the first time at our very first preview of Windows 8 about a year ago), which is designed for holding the tablet with two hands and typing with your thumbs. This keyboard is adjustable in size, to accommodate different hand sizes. An interesting observation from our posture research is that people frequently switch postures, and that posture switch is often seen as a positive thing, as we move about to remain comfortable. So in our keyboard layouts we also considered what it would be like to type for a period of time—say, an email to your mom—and switch postures while you do it. You might start by typing with the tablet lying on the coffee table, for example, but then you might tire of that posture and pick up the tablet, lie back on the couch, and interact with two thumbs.
Further research into posture and comfort helped us to understand how people hold tablets, and how far our thumbs typically reach. In a follow-up study, we had a wide selection of people with different hand sizes use a tablet with sensors that would indicate where their thumbs could reach most comfortably, where they could extend to, and where reach was just uncomfortable. These results helped us optimize the use of the system with thumbs, and helped shape the thumb keyboard layout.
This heat map illustrates the typical reach of people’s thumbs, overlaid on the thumb keyboard layout. Green is very comfortable, yellow can be reached, and red is typically uncomfortable.
Typing on glass
The next challenge we considered was the experience of typing on the glass display of a tablet. At least one of the key postures—laying the tablet down—is analogous to typing on a physical keyboard. So unlike typing text on a phone, we were faced with direct comparisons with the physical keyboard experience. When you type on your laptop or desktop, you enjoy some real benefits. You get a lot of sensory feedback as you type. First, you can position your hands quickly on your home keys, and most keyboards have small bumps on the J and F keys (in English QWERTY keyboards) to confirm that position. Then, as you type, the shape of the keys reinforces where your fingers are as they move about. The keys have “travel,” or small up-and-down movement, which confirms that you struck them. And because the keyboard is mechanical, there is a tapping sound that confirms your key strikes (perhaps to your chagrin, if your colleagues are checking email during meetings J).
If you lay down a piece of glass and type on it, you get no feedback; there is no indication for where to position your hands, and there is no indication of whether you’ve hit a target or not. Recognizing this, we made a few decisions. We needed to provide some type of feedback, and we needed to recognize that people will be more “sloppy” when typing on a touch keyboard. But we also observed that a touch keyboard can do things that a physical keyboard can’t, and we should bring those functions out.
The feedback you see in the touch keyboard comes in two forms—the keys change color when you touch them, and they trigger a subtle sound. This is similar to what you see on most phone touch keyboards. We considered other forms of feedback, but ruled them out as too disruptive or unnatural. For example, we explored haptic feedback (a vibration of the device based on input) which you also find on many phones. But most people find the current state-of-the-art haptics somewhat irritating when typing pieces of any length and a buzz can feel as much like a punishment as a reassurance.
Our two forms of feedback—visual key changes and sounds—are not without controversy either. Visual key changes are not always ideal when you are entering a password, for example, and for that reason we enable you to suppress feedback in these cases. Some people have argued that key press sounds are irritating and artificial. But user testing confirmed our assumption that people clearly find the sounds reassuring and confidence-inspiring when typing on glass. The specific sounds we use (which are very similar to those on the Windows Phone) are designed to be “residual,” where you quickly forget that they are there, but would notice if they were turned off.
Both forms of feedback may be used more when people are first getting used to the experience. We have done eye-tracking studies in the lab, which showed that as people become more proficient with the touch keyboard, they spend more time looking at the input field, and less time looking at the keyboard itself. So the appearance of each character becomes the best feedback when you are typing efficiently. I’ll tell you a little more about these eye-tracking studies later in this post.
As people spend time with the touch keyboard, their focus moves more consistently to the input field, as this heat map from an eye-tracking study shows.
But even when you “get good” at typing on a touch keyboard on glass, you will still be sloppier and slower than you would be with a physical keyboard. The Windows 8 touch keyboard has some special accommodations to address this reality. The most interesting one is what we call the “touch model.”
When you tap a key on the touch keyboard, we detect the coordinates of your touch, and we can map it to the geometry of the keys. But as your fingers move about across the glass, your press is likely to migrate outside the boundaries of the key you intended to touch. If we relied simply on the geometry mapping of the keys, you would see a lot of errors. To account for this, the key press is first compared against a model that assesses the likelihood that you intended to strike that key or a key near it. This processing is informed by two things. First, we use data from many people’s typing pangrams, or phrases that use every letter of the alphabet, recording trends where peoples bias their touch away from the intended target. For example, they might intend to type a p, but often strike the o, because most people’s fingers curve inward. Based on a set of characteristics, including typing speed, the model weights the likelihood that you intended to type one key over another. Secondly, we use lexical data representing letters and words that are likely to be strung together in writing. This is the same system that enables spelling correction—the system “knows” what you probably intended to type even if you made a mistake.
Based on the touch model, the keyboard is often able to quietly correct cases where you intended to type a p for example, but inadvertently struck the o, on a QWERTY layout. Or consider the example where you are typing the word “the.” If you type t then h and then touch between the e and w but slightly more on the w, the touch model adjudicates this, knows that t-h-e is the common character combination in English rather than t-h-w, and appropriately outputs the e. But if you touch the w fully, the keyboard respects that input and assumes you know best. This all happens while you are typing, so the right character goes into the input field and doesn’t require further correction. When this works best, you don’t realize it’s even happening, increasing your confidence in typing on glass.
This map from a report on touch model data illustrates biases that people show toward certain keys when typing on a touch keyboard.
Great for typing
Once we accounted for feedback and provided “guard rails” for inevitable mistakes, we still had to determine the specific keyboard layouts—what keys go where. Key positions have a big influence over typing speed and accuracy, and people have very strong—and often conflicting—opinions about keys. But the design problem broke down logically, based on our observations of interaction and some physical realities. For example, we confirmed our assumptions that:
- Most people have developed very strong habits based on the conventions of physical keyboards. When you break these conventions, it slows their typing down appreciably. This even applies to very young folks or dedicated T9 typists, for example, as most of us learn to touch-type in some form at a young age.
- There are optimal targetable sizes of keys. The extensive research Microsoft has done into physical keyboards applied here too. For example, the letter keys on our touch keyboard are 19mm wide, the same as on most physical keyboards, because people showed faster typing speeds with targets of that size (rather than smaller or larger).
- The more keys you include, the more likely people are to make mistakes. This is partly because more keys mean the keys need to be smaller and there’s a greater likelihood of hitting a key you didn’t intend. More keys also create visual clutter and distraction and slow your ability to scan and find a key.
- You don’t want to obscure more than half the display with a keyboard. A too-large keyboard creates a claustrophobic experience and you lose context. However, there is a counter rule that says obscuring about half the display works fine. This is because entering text is most often a “modal” activity, where your focus is very much on typing something and not on the periphery. Your area of focus outside the keyboard is relatively small, and directed toward the characters you’re typing. Our eye-tracking studies, illustrated in this post, demonstrate this.
- People use some keys more than others. We deduce this from analyzing passages of text written in real-world circumstances. There are clear patterns of frequency in the use of letters and symbols.
- People will learn to do new things—and learn quickly—if they don’t interfere with habits.
So in the end, the layout of a touch keyboard in any language becomes a balancing act of the different factors. You want to reduce the number of keys in the default layout, for example, but if you remove a key people rely on in typing every day, you will frustrate them. The layout needs to be big enough to support accuracy, but not so big it obscures the application.
There was one more overall rule or principle that we applied to the keyboard layouts specifically: They must be great for typing. That seems obvious but it’s clarifying when you recognize that keyboards are used for a lot of things other than writing words—shortcuts to UI, for example, or sending commands, or entering codes. Our keyboard is optimized for typing, because that is its primary purpose and it must do it well above all other things. Let’s take a look at a few of the decisions we made that fit within these parameters.
We get a lot of questions about why we don’t include a number row in the default keyboard layout. We use numbers frequently in our jobs, and we’re used to finding number keys on the top of our physical keyboard. The Windows 7 on-screen keyboard has a number row, for example. This is consistent with the overall design of that keyboard—it is essentially a software emulation of a physical keyboard. It has not been optimized for a world of touch.
The Windows 7 on-screen keyboard emulates a physical keyboard and isn’t optimized for touch or typing.
Some of our early designs and prototypes had a number row too. But when we brought these designs in front of people, the feedback was strong that the keyboard felt “cramped” compared to what they were used to. We observed frequent errors and accidental invocation of keys, especially around the perimeter of the layout. This resulted in a number of changes, and it confirmed the decision to not include a number row. Here’s why: Including a number row meant adding a fourth row of character keys. When we optimize for keys with a targetable size, that means the keyboard must be that much higher. On a typical tablet device (say with a screen size of 10.6 inches) adding a number row would mean that more than half of the display would be covered by the keyboard. When we combined this with the observation that numbers are typed less frequently than most letters and common symbols, and you recognize that the extra keys are causing accidental key presses, we settled on including numbers on the separate number and symbol view.
That settled, we still had debates about whether to display numbers as a row across the top of the numbers and symbols view, or to display it as a numeric pad. We chose the numeric pad for a few reasons:
- People often enter multiple numbers at once.
- It’s easier to scan an organized group than a long row.
- People type number sequences much faster when the numbers are clustered.
We also decided to include the numbers in 1,2,3 order from the top, rather than 7,8,9, as it appears on many extended computer keyboards or cash registers. This is an interesting case where the physical keyboard convention didn’t matter as much, because people have become familiar and very comfortable with the order of number pads on phones, ATMs, remote controls, and other modern devices. 1,2,3 order is simply easier for the eyes to scan and the brain to process than any other order.
The number and symbol view includes a numeric pad that reflects modern layouts we find on phones, ATMs, and remote controls.
The tab key has a similar story. It’s a key we use a lot—for formatting documents, but also for things like navigating input fields on a webpage. For that reason, we included it in one of our early touch-optimized layouts, after we had removed a lot of other keys typically found on physical keyboards. It looked like this.
An early layout of the keyboard had extra keys that interfered with accuracy and speed.
You might observe that on the right and the left, there are borders of keys that aren’t letters or symbols. This layout yielded the results described above—people experienced a cramped feeling. And worse than that, they frequently missed character keys and inadvertently touched one of the border keys. When we removed them, people raved about the openness and comfort of the layout, their errors went down, and their speed went up. With the Tab key on the numbers and symbols view, it was harder to reach—but the keyboard was better for typing, and so the Tab key’s peregrinations were over.
Downshift: a mistake to learn from
The last example we’ll share involves a feature we had in the product and have subsequently cut. This is a feature inspired by our desire to make punctuation easier to get to, without a complete view switch. In this design, the left shift key acted as the shift key does today—it enabled capital letters and access to alternate symbols from the default view. We used the right shift key differently—it provided a “peek” into frequently-used symbols or punctuation. The idea was that you would “downshift” briefly to select punctuation, for example, but not lose the context of the main view, and thus be faster. We theorized that this was a place where we could deviate from convention and provide value you could only get with software. Here’s a picture of the “downshift” keyboard.
The downshift design was intended to provide fast way to access symbols, but interfered with expectations for shift behavior.
Suffice to say this prototype did not succeed in the lab. Participants continually struck the right shift key for the usual reasons you’d use a shift key. And when the keyboard showed the “peek” to symbols, they were confused and their typing came to a halt. So this was a case where we had to stick with the convention of a physical keyboard.
There is an interesting counter example in press-and-hold behavior. On a physical keyboard, when you press and hold a character, it repeats. On our touch keyboard when you press and hold, we show alternate characters or symbols. This is something a touch keyboard can do well and a physical keyboard can’t. If you don’t know the specific key combination to show ñ or é or š, for example, it’s painful to type on a physical keyboard. It’s easy to find on the touch keyboard. Practically no one has complained about this departure from convention. We built on it, in fact. You might discover that you can simply swipe from a key in the direction of the secondary key, and that character will be entered, without an explicit selection from the menu. So if you use accented characters a lot, you can get pretty fast with this. Try it out!
When you press and hold a key, it reveals related keys. If you swipe quickly toward the secondary key you want, you can select it quickly.
Testing and validating
We’ve been conducting a series of eye-tracking studies, where cameras record the direction of the participants’ gaze as they are interacting with the system. These studies help us determine a few things: Where do people look when typing on a touch keyboard? Does visual gaze change over time? Are these patterns consistent across different views or layouts? And is visual gaze correlated to speed of typing?
An eye-tracking study participant begins the session.
We’ve found very consistently that people primarily look at the text field where their characters appear, and they look at the keyboard. This is so consistent that we designed our text suggestion experience to optimize for this tendency. Text suggestions (words that are predicted as you type) appear right by the cursor in the text field, and you insert them by touching the “Insert” key on the touch keyboard. This is optimized for where we saw people putting their attention as they typed. It is notably different, for example, from text suggestion UI you see on many phones, where there is a band of possible words that run across the top of the keyboard. On a PC with a full-sized keyboard, people just don’t look there, and they don’t want to stop typing and change their posture to select these words.
Individual fixations, or recordings of a stabilized retina, show that people look either at the keyboard or at the text field. We do not typically look in between the two. Our text prediction UI appears near the caret for this reason.
We also found that our gaze does change over time, and as the gaze changes, we type faster. You can see this very clearly in the gaze plots of the eye-tracking studies. A full range of people show this tendency—from slow typists unfamiliar with tablets to skilled typists who spend a lot of time with tablets. In all cases, at first, there is more attention on the keyboard, and the speed is slower. Over time—say, about 90 minutes over a few days—there is markedly less attention paid to the keyboard, more to the text field, and words per minute go up significantly.
We can see in lab studies that the focus of our gaze changes over time. The left hand image shows a typist after just a few minutes. The right image shows the gaze plots after about 90 minutes. You can see that focus moves to the text field. This typist doubled her speed during the session.
Lastly, below is a picture of the current English QWERTY layout, which we have in the Windows 8 Release Preview. It is intentionally spare and open, and the keys that remain are there for explicit reasons. Each of these has its own story, but we can call out a few highlights:
- The backspace key is there because it’s used very frequently on physical keyboards and touch keyboards. If we removed it, you would find your finger groping for it repeatedly.
- The mode switch key is essential to moving between views and languages and for hiding the keyboard. IME users will find that this is how you switch to Windows IMEs, which also feature touch-optimized keyboard layouts.
- The CTRL key and the right and left arrow keys are intended for text editing operations. You can move your input cursor and cut, copy, and paste without moving your hands from the keyboard. (Note that the CTRL key works just as it does on a physical keyboard—so any supported combination will work. We include labels for things like cut, copy, paste, and bold, because they are related to text editing. The touch keyboard is not intended for “commanding,” which is why you don’t see things like the Windows key or function keys. That is a deliberate decision to stay focused on the goal of being really great for typing.
- The space bar is centered and wide. Physical keyboard research shows that about 80% of strikes on the space bar occur on the right (if you look at older keyboards, you will notice the wear on that side). This holds for touch keyboards too, where people will miss the spacebar if it’s not ample-sized, and this creates errors that are hard to recover from.
- The “emoji” or emoticon key switches you to emoji view, where we support a full set of Unicode-based emoji characters. The use of emoji continues to grow worldwide, and has become a part of how people write and express themselves.
- We also include an option for a standard keyboard layout, which can be useful on a PC without a keyboard when using desktop software that requires function keys or other extended keys. This is easily enabled from the settings Charm, in the General Settings section of PC Settings.
As you use the keyboard, we hope you also discover some extra features we’ve added to make things easier. For example, if you hold down the &123 key, you can select symbols or numbers with your other hand, and when you release, you return to your original view. The team calls this “multi-touch view peek.”
The current touch-optimized layout reflects decisions about each of the keys based on a series of studies.
These optimizations apply across the input languages we have in Windows, as we support a touch-optimized typing experience worldwide. We expect to make a few more improvements to the typing experience, and we are really grateful and delighted by the feedback we’ve received so far. Thanks!