The World-Wide Telescope
Jim Gray and Alexander Szalay
Someday all scientific literature and data will be online and accessible to everyone everywhere. The astronomy community has made unusually good progress toward this vision of online science in addition to addressing the associated challenges of data publication.
Much of astronomy involves comparing data from many instruments (from different parts of the electromagnetic spectrum) taken at different times. The Crab Nebula provides a good example of the temporal and multispectral nature of astronomy. The Crab Star supernova was first observed on July 4, 1054. Now we see it as the Crab Nebula—a gas cloud expanding at relativistic speeds. The cloud's core appears to be a black hole. Matter falling into this hole emits two energy beams that illuminate the cloud. Looking at this system in the X-ray spectral band shows the beams. Looking at the system in the optical and infrared spectral bands shows the gas escaping from the black hole. Each provides complementary information that together presents a fairly complete picture.
Astronomy obeys Moore's law: it is producing about two times more data each year. Current instruments typically produce nearly a terabyte per night. Managing huge data archives and processing complex data are now among the major astronomy challenges.
The Sloan Digital Sky Survey (SDSS) is a good example. It is a 5-band optical survey of the northern sky, observing about 400 million sources as images and 1 million with spectra. These spectra allow detailed studies of large star and galaxy populations. The traditional way of accessing this data is to place it in files and let users FTP the files they want to their local systems for analysis.
We built an online catalog of the SDSS data as a Web-accessible database, along with visual tools to analyze the data (http://cas.sdss.org/dr7/en/). The result is a SQL Server™ database with approximately 14 billion rows. It gives full GUI and SQL access to the SDSS data. Now everyone can use one of the world's best telescopes. The site has been a big success—about 10 percent of the visitors are students using online courses, but the main users are astronomers analyzing the available data.
SkyServer must support many browsers running on many platforms, so we took a thin-client approach in which most processing is done on a server that produces standard HTML. Much of the server-side logic is implemented in T-SQL stored procedures. The Web services rendering images are coded in C#. The database schema is self-documenting, and the design allows users to plug their catalog and image data into a spatial search framework and Web service. A C# variant of the spatial data search is part of the SQL Server 2005 samples. You can download a personal copy of all the SkyServer code (Web site and database) along with a version of the database from SkyServer.org.
SkyServer has quite a few Web services to give users programmatic access to the data and analysis tools, a classic service-oriented architecture. But some of the astronomy queries run for hours, so we set up a system to let users submit long-running jobs (the Catalog Archive Serve Jobs System, or CASJOBS). We also allowed users to create personal databases (MyDB) near the server. MyDB stores intermediate results and uploaded user data, allowing users to do multistep analyses on huge datasets. The article "Batch is back: CasJobs, serving multi-TB data on the Web,", describes this system.
The International Virtual Observatory Alliance is a grassroots group of astronomers who want to federate world-wide astronomy data, cross-index it with the literature, and provide analysis tools to the community. The SDSS data is now part of this federation. A user can point to a SkyServer object and find all the literature on that object, as well as all other public data sets about that object. This is all architected using Web services defined by the IVOA. The IVOA is also defining a standard schema and ontology for astronomy data (www.ivoa.net/Documents).
We are now much closer to reaching the goal of the World-Wide Telescope federating all the world's astronomy data and literature. OpenSkyQuery.Net offers a prototype federating 29 archives. The astronomy community is well on the way to building the next generation of this design.
Jim Gray works at Microsoft Research and focuses on eScience scaleable systems and databases. He has been active in building online databases like http://cas.sdss.org/dr7/en/. See research.microsoft.com/~gray.
Alexander Szalay is a Professor at the Johns Hopkins University, both in the Department of Physics and Astronomy, and the Department of Computer Science. He is the Project Director of the NSF-funded National Virtual Observatory. See www.sdss.jhu.edu/~szalay.