Retour sur Quelques Projets Présentés lors de la TechFest Microsoft

Article
05/04/2011

Tous les ans en Mars Microsoft présente lors d’un salon interne les principaux projets de recherche. Voici une liste de quelques projets publics afin de vous donner une idée sur les innovations à venir. Absolument passionnant!

3-D, Photo-Real Talking Head

Our research showcases a new, 3-D, photo-real talking head with freely controlled head motions and facial expressions. It extends our prior, high-quality, 2-D, photo-real talking head to 3-D. First, we apply a 2-D-to-3-D reconstruction algorithm frame by frame on a 2-D video to construct a 3-D training database. In training, super-feature vectors consisting of 3-D geometry, texture, and speech are formed to train a statistical, multistreamed, Hidden Markov Model (HMM). The HMM then is used to synthesize both the trajectories of geometric animation and dynamic texture. The 3-D talking head can be animated by the geometric trajectory, while the facial expressions and articulator movements are rendered with dynamic texture sequences. Head motions and facial expression also can be separately controlled by manipulating corresponding parameters. The new 3-D talking head has many useful applications, such as voice agents, telepresence, gaming, and speech-to-speech translation. Learn more...

3-D Scanning with a Regular Camera

3-D television is creating a huge buzz in the consumer space, but the generation of 3-D content remains a largely professional endeavor. Our research demonstrates an easy-to-use system for creating photorealistic, 3-D-image-based models simply by walking around an object of interest with your phone, still camera, or video camera. The objects might be your custom car or motorcycle, a wedding cake or dress, a rare musical instrument, or a handcrafted artwork. Our system uses 3-D stereo matching techniques combined with image-based modeling and rendering to create a photorealistic model you can navigate simply by spinning it around on your screen, tablet, or mobile device.

Applied Sciences Group: Smart Interactive Displays

Our research shows:

Steerable AutoStereo 3-D Display: We use a special, flat optical lens (Wedge) behind an LCD monitor to direct a narrow beam of light into each of a viewer’s eyes. By using a Kinect head tracker, the user’s relation to the display is tracked, and thereby, the prototype is able to steer that narrow beam to the user. The combination creates a 3-D image that is steered to the viewer without the need for glasses or holding your head in place.

Steerable Multiview Display: The same optical system used in the 3-D system, Wedge behind an LCD, is used to steer two separate images to two separate people rather than two separate eyes, as in the 3-D case. Using a Kinect head tracker, we find and track multiple viewers and send each viewer his or her own unique image. Therefore, two people can be looking at the same display but see two completely different images. If the two users switch positions, the same image continuously is steered toward them.

Retro-Reflective Air-Gesture Display: Sometimes, it’s better to control with gestures than buttons. Using a retro-reflective screen and a camera close to the projector makes all objects cast a shadow, regardless of their color. This makes it easy to apply computer-vision algorithms to sense above-screen gestures that can be used for control, navigation, and many other applications.

A display that can see: Using the flat Wedge optic in camera mode behind a special, transparent organic-light-emitting-diode display, we can capture images that are both on and above the display. This enables touch and above-screen gesture interfaces, as well as telepresence applications.

Kinect based Virtual Window: Using Kinect, we track a user’s position relative to a 3-D display to create the illusion of looking through a window. This view-dependent-rendered technique is used in both the Wedge 3-D and multiview demos, but the effect is much more apparent in this demo. The user quickly should realize the need for a multiview display, because this illusion is valid for only one user with a conventional display. This technique, along with the Wedge 3-D output and 3-D input techniques we are developing, are the basic building blocks for the ultimate telepresence display. This Magic Window is a bidirectional, light-field, interactive display that gives multiple users in a telepresence session the illusion that they are interacting with and talking to each other through a simple glass window. Learn more...

Cloud Data Analytics from Excel

Excel is an established data-collection and data-analysis tool in business, technical computing, and academic research. Excel offers an attractive user interface, easy-to-use data entry, and substantial interactivity for what-if analysis. But data in Excel is not readily discoverable and, hence, does not promote data sharing. Moreover, Excel does not offer scalable computation for large-scale analytics. Increasingly, researchers encounter a deluge of data, and when working in Excel, it is not easy to invoke analytics to explore data, find related data sets, or invoke external models. Our project shows how we seamlessly integrate cloud storage and scalable analytics into Excel through a research ribbon. Any analyst can use our tool to discover and import data from the cloud, invoke cloud-scale data analytics to extract information from large data sets, invoke models, and then store data in the cloud—all through a spreadsheet with which they are already familiar. Learn more...

Controlling Home Heating with Occupancy Prediction

Home heating uses more energy than any other residential energy expenditure, making increasing the efficiency of home heating an important goal for saving money and protecting the environment. We have built a home-heating system, PreHeat, that automatically programs your thermostat based on when you are home. PreHeat’s goal is to reduce the amount of time a household’s thermostat needs to be on without compromising the comfort of household members. PreHeat builds a predictive model of when the house is occupied and uses the model to optimize when the house is heated, to save energy without sacrificing comfort. Our system consists of Wi-Fi and passive, IR-based occupancy sensors; temperature sensors; heating-system controllers for U.S. forced-air systems and for U.K. water-filled radiators and under-floor heating; and PC-based control software using machine learning to predict schedules based on current and past occupancy. Learn more...

Face Recognition in Video

Face recognition in video is an emerging technology that will have great impact on user experience in fields such as television, gaming, and communication. In the near future, a television or an Xbox will be able to recognize people in the living room, home video will be annotated automatically and become searchable, and TV watchers will be able to get information about an unfamiliar actor, athlete, or singer just by pointing to the person on the screen. Our research showcases the face-recognition technology developed by iLabs. Our technology includes novel algorithms in face detection, recognition, and tracking. The research demonstrates semi-automatic labeling of videos, a novel TV-watching experience using faces in a video as hyperlinks to get more information, and automatic recognition of the person in front of the television, Xbox, or computer. Learn more...

Fuzzy Contact Search for Windows Phone 7

Mobile-phone users typically search for contacts in their contact list by keying in names or email IDs. Users frequently make various types of mistakes, including phonetic, transposition, deletion, and substitution errors, and, in the specific case of mobile phones, the nature of the input mechanism makes mistakes more probable. We propose a fuzzy-contact-search feature to help users find the right contacts despite making mistakes while keying in a query. The feature is based on the novel, hashing-based spelling-correction technology developed by Microsoft Research India. We support many languages, including English, French, German, Italian, Spanish, Portuguese, Polish, Dutch, Japanese, Russian, Arabic, Hebrew, Chinese, Korean, and Hindi. We have built a Windows Phone 7 app to demonstrate our fuzzy contact search. The solution is lightweight and can be used in any client-side contact-search scenario. Learn more...

High-Performance Cancer Screening

Our research demonstrates high-performance, GPU-based 3-D rendering for colon-cancer screening. The VCViewer provides a gesture-based user interface for the navigation and analysis of 3-D images generated by computed-tomography (CT) scans for colon-cancer screening. This viewer is supported by a server-side volume-rendering engine implemented by Microsoft Research. Our work shows a real-world, life-saving medical application for this engine. In addition, we show high-performance, CPU-based image processing needed to prepare CT colonoscopy images for diagnostic viewing. This processing was developed at the 3-D Imaging Lab at Massachusetts General Hospital and has been adapted for task and data parallelism in joint collaboration with Microsoft Developer and Platform Evangelism, Microsoft Research, and Intel.

InnerEye: Visual Recognition in the Hospital

Our research shows how a single, underlying image-recognition algorithm can enable a multitude of clinical applications, such as semantic image navigation, multimodal image registration, quality control, content-based image search, and natural user interfaces for surgery being enabled within the Microsoft Amalga unified intelligence system. Learn more...

Interactive Information Visualizations

Our research presents novel, interactive visualizations to help people understand large amounts of data:

o iSketchVis applies the familiar, collaborative features of a whiteboard interface to the accurate data-exploration capabilities of computer-aided data visualization. It enables people to sketch charts and explore their data visually, on a pen-based tablet—or collaboratively, on whiteboards.

o NetCharts enables people to analyze large data sets consisting of multiple entity types with multiple attributes. It uses simple charts to show aggregated data. People can explore these aggregates by dragging them out to create new charts.

o Sets traditionally are represented by Euler diagrams with bubble-like shapes. This research presents two techniques to simplify Euler diagrams. In addition, we demonstrate LineSets, which uses a single, continuous curve to represent sets. It simplifies set intersections and offers multiple interactions.

MirageBlocks

Our research demonstrates the use of 3-D projection, combined with a Kinect depth camera to capture and display 3-D objects. Any physical object brought into the demo can be digitized instantaneously and viewed in 3-D. For example, we show a simple modeling application in which complex 3-D models can be constructed with just a few wooden blocks by digitizing and adding one block at a time. This setup also can be used in telepresence scenarios, in which what is real on your collaborator’s table is virtual—3-D projected—on yours, and vice versa. Our work shows how simulating real-world physics behaviors can be used to manipulate virtual 3-D objects. Our research uses a 3-D projector with active shutter glasses.

Mobile Photography: Capture, Process, and View

The mobile phone is becoming the most popular consumer camera. While the benefits are quite clear, the mobile scenario presents several challenges. It is not always easy to capture good photos. Image-processing tools can improve photos after capture, but there are few tools tailored to on-phone image manipulation. We present phone-based image-enhancement tools that are tightly integrated with cloud services. Heavy computation is off-loaded to the cloud, which enables faster results without impacting the phone’s performance. Learn more...

Project Emporia: Personalized News

Project Emporia is a personalized news reader offering 250,000 articles daily as discovered through social news feeds. It combines state-of-the-art recommendation systems (Matchbox) with automatic content classification (ClickPredict) to enable users to fine-tune their news channels by category or a custom-keyword channel, combined with "more-like-this"/"less-like-this" votes. It is available as a mobile client as well as on the web. Learn more...

Recognizing Pen Grips for Natural UI

By enabling multitouch sensing on a digital pen, we can recognize how the user is holding it. In the real world, people hold tools such as pens, paintbrushes, sketching pencils, knives, and compasses differently, and we enable a user to alter the grip on a digital pen to switch between functionalities. This enables a natural UI on the pen—mode switches are no longer necessary. Learn more...

Rich Interactive Narratives

Recent advances in visualization technologies have spawned a potent brew of visually rich applications that enable exploration over potentially large, complex data sets. Examples include GigaPan.org, Photosynth.net, PivotViewer, and WorldWide Telescope. At the same time, the narrative remains a dominant form for generating emotionally captivating content—movies or novels—or imparting complex knowledge, as in textbooks or journals. The Rich Interactive Narratives project aims to combine the compelling, time-tested narrative elements of multimedia storytelling with the information-rich, exploratory nature of the latest generation of information-visualization and -exploration technologies. We approach the problem not as a one-off application, Internet site, or proprietary framework, but rather as a data model that transcends a particular platform or technology. This has the potential of enabling entirely new ways for creating, transforming, augmenting, and presenting rich interactive content. Learn more...

ShadowDraw: Interactive Sketching Helper

Do you want to be able to sketch or draw better? ShadowDraw is an interactive assistant for freehand drawing. It automatically recognizes what you’re trying to draw and suggests new pen strokes for you to trace. As you draw new strokes, ShadowDraw refines its models in real time and provides new suggestions. ShadowDraw contains a large database of images with objects that a user might want to draw. The edges from any images that match the user’s current drawing are merged and shown as suggested "shadow strokes." The user then can trace these strokes to improve the drawing. Learn more...

Social News Search for Companies

Social News Search for Companies uses social public data to build a great news portal for companies. The curation of this page can be crowdsourced to improve the quality of results. We tackle two questions: How can we use social media to provide a rich, topical, searchable, living news dashboard for any given company, and can we build an environment where the curation of the sources of content for a company page is done by the users of the page rather than by an editor? Learn more...

Retour sur Quelques Projets Présentés lors de la TechFest Microsoft

Additional resources