Microsoft Codename “Cloud Numerics” Lab Refresh

We are announcing a refresh of the Microsoft Codename "Cloud Numerics" Lab. We want to thank everyone who participated in the initial lab, we amassed and used your feedback to make improvements and add exciting features. Your participation is what makes this lab a success. Thank you.

Here’s what is new in the refresh:

Improved user experience: through more actionable exception messages, a refactoring of the probability distribution function APIs, and better and more actionable feedback in the deployment utility. In addition, the deployment process time has decreased and the installer supports installation on a on-premises Windows HPC Cluster. All up, this refresh provides for a more efficient way of writing and deploying “Cloud Numerics” applications to Windows Azure.

More scale-out enabled functions: more algorithms are enabled to work on distributed arrays. This significantly increases the breadth and depth of big data algorithms that can be developed using “Cloud Numerics” Lab. Scale-out functionality was added in the following areas: Fourier transforms, linear algebra, descriptive statistics, pattern recognition, random sampling, similarity measures, set operations, and matrix math.

Array indexing and manipulation: a large part of any data analytics application concerns handling and preparing data to be in the right shape and have the right content. With this refresh “Cloud Numerics” adds advanced array indexing enabling users to easily and efficiently set and extract subsets of arrays and to apply Boolean filters.

Sparse data structures and algorithms: much of the real-world big data sets are sparse —not every field in a table has a value. With this refresh of the lab we introduce a distributed sparse matrix structure to hold these datasets and introduce core sparse linear algebra functions enabling scenarios such as document classification, collaborative filtering, etc.

Apply/Sweep framework: in addition to the built-in parallelism the “Cloud Numerics” Lab, this refresh now exposes a set of APIs to enable embarrassingly parallel patterns. The Apply framework enables applying arbitrary serializable .NET code to each element of an array or to each row or column of an array. The framework also provides a set of expert level interfaces to define arbitrary array splits. The Sweep framework performs as its name implies —this framework enables distributed parameter sweeps across a set of nodes allowing for better execution times.

Improved IO functionality: we added more parallel readers to enable out of the box data ingress from Windows Azure storage and introduced parallel writers.

Documentation: we introduced detailed mathematical descriptions of more than half of the algorithms using print-quality formulae and best-of-web equation rendering that help clarify algorithm mathematical definition and method behavior. In addition, we updated the “Getting Started” wiki, and we added conceptual documentation for the “Cloud Numerics” help that includes the programming model, the new Apply framework, IO, and so on.

 

Stay tuned for upcoming blog posts:

  • F# : We’ll be distributing a F# add-in for “Cloud Numerics” soon. The add-in exposes the “Cloud Numerics” APIs in a more functional manner, introduces operators, such as matrix multiply, and F# style constructors for and indexing on “Cloud Numerics” arrays.
  • Text analytics using sparse data structures

Do you want to learn more about Microsoft Codename “Cloud Numerics” Lab? Please visit us on our SQL Azure Labs home page, take a deeper look at the Getting Started material and Sign Up to get access to the installer. Let us know what you think by sending us email at cnumerics-feedback@microsoft.com.

The “Cloud Numerics” refresh depends on the newly released Azure SDK 1.7 and Microsoft HPC Server R2 SP4. It does not provide support for the Visual Studio 2012 RC.