{ End Bracket }

Weaving Your Photos with Photosynth

Richard Szeliski

The explosion of digital photography in the past several years has led to a dramatic increase in the number of photos that are shared on the Web. The Interactive Visual Media Group at Microsoft Research has responded to this digital photography phenomenon by developing innovative photo-editing and photo-viewing products. Microsoft Research was among the first to produce fully automated image-stitching software and to develop 360-degree video-based walkthroughs as well as generate 3D videos that can be navigated in real time (see research.microsoft.com/IVM).

One of the biggest challenges in this area has been the development of fully automated systems to build photorealistic 3D models from collections of photographs. While recent advances in this field have been dramatic—such as the texture-mapped 3D building models available in Virtual Earth™ 3D (see maps.live.com)—the resulting models did not achieve the full richness and variety inherent in the original photo collections.

To overcome these limitations, I partnered with Noah Snavely and Steve Seitz, both from the University of Washington, in developing a 3D photo-browsing system called Photo Tourism (see phototour.cs.washington.edu). Our system uses computer-vision techniques to reconstruct a partial 3D model of the scene being photographed, along with the 3D position and orientation of each image in a collection. This is accomplished by first extracting distinctive feature points in each image, matching these across the whole collection, and then incrementally reconstructing the 3D camera and scene geometry by solving a large sparse non-linear optimization problem. The user then can navigate from image to image by selecting regions of interest or using intuitive commands such as move left or move right.

To smooth transitions between images and give a sense of 3D motion, our system simulates 3D camera moves while projecting the images onto planar "impostors" (proxies), a technique sometimes used in computer games to model distant geometry. A sketched 3D model of the scene, consisting of a point cloud, line segments, and low-resolution "watercolor washes" is also used. The resulting 3D image browser combines the realism and the beauty of a traditional slideshow with the 3D interactivity of video games.

Once we had the research prototype built, we needed to scale our system to support real-time, multi-resolution streaming of images over the Internet in order to move from the lab environment to general deployment. Microsoft® Live Labs had just acquired a startup called Seadragon, which already had technology that streams images at a variety of resolutions and displays them as animated 3D arrangements. The Seadragon engine converts original images into a set of multi-resolution, overlapping, tiled image fragments, which are streamed on an as-needed basis and combined on the client side to provide visually seamless zooming and progressive image refinement while simultaneously supporting the display of thousands of images.

A small, cross-functional group of designers, program managers, researchers, and developers worked together to design, build, and deploy Photosynth™, a system that combines the 3D image placement and navigation technologies with the underlying Seadragon engine. The initial user interface, which relied on a number of thumbnail panes and icons to control the navigation, underwent several rounds of redesign. In the final design, a glowing quadrilateral indicates the presence of additional images in the vicinity of the user’s mouse. Clicking on the quadrilateral seamlessly transitions the view to the new image. Also added were a 3D overview of the scene and a 2D, scaled thumbnail view of the images as well as the filmstrip control to sequence through the images and to step back through the navigation history (see https://photosynth.net).

Plans for future versions of Photosynth include improving the 3D reconstruction algorithms to scale to larger data sets and to enable consumer-level authoring, and also to extend the viewer to additional platforms. We are investigating how to integrate Photosynth with regular slideshows and photo-sharing sites, how to achieve smooth circumnavigation of 3D objects, and how to integrate with 3D city models such as those in Virtual Earth. We are also exploring additional applications of Photosynth beyond tourism, such as education (including interactive museum tours and historical time lapses), real estate sales, and indexing large image collections. Our hope is that Photosynth will become an established visual medium combining the beauty and the richness of traditional photography with the interactive exploration inherent in games and 3D worlds.

Richard Szeliski leads the Microsoft Research Interactive Visual Media Group, which invents new ways for people to capture, explore, and share their personal memories.