Making Sense of Data Overload: An Innovative Approach to Progressive Data Analysis

It's really good to see that the datamining work (ProDA) that Cyrus Shahabi of USC is getting more visibility - the use of wavelet compression is a really neat way to deal with large amounts of data and make it easy to see the can see why folks like Chevron were interested in it.

Making Sense of Data Overload: An Innovative Approach to Progressive Data Analysis

When Professor Cyrus Shahabi of the University of Southern California decided to tackle the problem of complex data analysis, he was confronted by the limitations of current software. Realizing what an impediment this was for businesses and the scientific community, he began to explore alternative forms of analysis. When he came across signal processing and wavelet compression, he knew he was onto something, and ProDA was born. Since creating ProDA, NASA’s JPL and Chevron have had major successes using the program to manage their huge datasets. With the help of Microsoft Research’s Smart Client initiative, Shahabi was able to bring ProDA to the next level by making it more compatible with XML, Microsoft Excel, text files, and many more formats. All these changes have made ProDA more accessible and user friendly.

April 2008 article in IEEE Computer - ProDA: An End-to-End Wavelet-Based OLAP System for Massive Datasets by Cyrus Shahabi (USC)


ProDA employs wavelets to support exact, approximate, and progressive OLAP queries on large multidimensional datasets, while keeping update costs relatively low. ProDA not only supports online execution of ad hoc analytical queries on massive datasets, but also extends the set of supported analytical queries to include the entire family of polynomial aggregate queries as well as the new class of plot queries.