Graywulf Takes Byte Out of Data Overload
Graywulf is the natural evolution of Beowulf Clusters – it brings together HPC clusters and databases to do efficient processing and data management. It’s name and design also pays homage to Jim Gray – who helped champion the use of relational databases in the scientific projects.
At it’s simplest form Graywulf is having a database installed on each of the HPC compute nodes – this brings the data to the computation – one of the points Jim made quite often and utilizes the power of databases (queries, stored procedures, etc). Since it’s a generic architecture Graywulf clusters can be built using any OS and any database…the ones in the case study below implemented them using Windows HPC Server and SQL Server and the motivation was to be more efficient in doing the science – it’s always great to have innovative folks using technologies to do good work.
“To put it simply, a scientist needs to be able to live within the data,” says Alexander Szalay, a cosmologist-turned-computer-scientist at The Johns Hopkins University (JHU) in Baltimore, Maryland. The power of information, Szalay says, is determined not by its quantity so much as how easy it is to access, manipulate and analyze.
“It’s not just about doing the numerical calculations,” adds Andrew Simms, a biomedical health informatics graduate student working on protein structure analysis in Valerie Daggett’s bioengineering laboratory at the University of Washington (UW) in Seattle. “It’s also about assembling the data so we can run calculations while performing analyses and ad hoc explorations and then feed it all back into the data warehouse.”
Astronomers at The Johns Hopkins University and protein scientists at the University of Washington are using inexpensive computer hardware combined with powerful computing and database software to help manage and analyze a growing volume of scientific data.
For details, read the Graywulf case study.