LINQ TO HPC Programmer's Guide (Beta 2)

Important

This will be the final preview of LINQ to HPC and we do not plan to move forward with a production release. In line with our announcement in October at the PASS conference, we will focus our effort on bringing Apache Hadoop to both Windows Server and Windows Azure. For more information, see Microsoft to develop Hadoop distributions for Windows Server and Azure and Microsoft Expands Data Platform to Help Customers Manage the ‘New Currency of the Cloud’.

This guide is intended for developers who want to create LINQ to HPC applications. LINQ to HPC applications use a High Performance Computing (HPC) cluster to manipulate and analyze very large data sets. LINQ to HPC and the Distributed Storage Catalog (DSC) include services that run on an HPC cluster, as well as client-side components that are invoked by applications. This guide focuses on the client-side components that you use to create applications.

Note

For information about the server-side components of LINQ to HPC and the DSC, see Overview of LINQ to HPC and the Distributed Storage Catalog (Beta 2).

Note

Although the sample code in this guide uses Microsoft® Visual C#®, you can write LINQ to HPC applications in any .NET language, including Visual Basic and Visual F#.

LINQ to HPC is based on the Language Integrated Query (LINQ) technology, and on the Microsoft® .NET Framework 3.5. LINQ to HPC applications are .NET 3.5 applications that contain LINQ to HPC queries. The DSC manages the data that is used by the queries.

The guide contains the following sections.

  • Configure a New LINQ to HPC Project in Visual Studio. This section explains how to create a LINQ to HPC project in the Microsoft Visual Studio® development system.

  • The LINQ to HPC and DSC Object Models. This section explains the main object types that are used to create LINQ to HPC queries.

  • Configuring LINQ to HPC Queries. This section explains how to use the HpcLinqConfiguration class to control the behavior of LINQ to HPC queries and DSC operations.

  • Creating DSC File Sets. This section uses different scenarios to explain how to create a DSC file set. For example, it shows what to do if you have text files, or what to do if you have data that is stored as a .NET Framework type.

  • Querying DSC File Sets. This section uses different scenarios to explain how to query data that is in a DSC file set. For example, it shows what to do if the data is in text format, or what to do if it is in binary format.

  • Implementing Distributed Algorithms by Using LINQ to HPC. This section explains advanced scenarios that use LINQ to HPC queries. For example, it shows how to use the Apply and RangePartition operators to create queries that are aware of how data is distributed on the cluster.

  • Managing LINQ to HPC Queries. This section describes various topics, such as how to monitor a running query, how to cancel a query, how to debug a query, and how to optimize performance and scalability. There is also a section on migrating to LINQ to HPC from other distributed computing frameworks.

  • DSC Command-Line Reference. This section describes the commands that you can use to configure the DSC.

  • Appendix A: Performing Common Administrative Tasks. This section describes how to perform common tasks such as what to do if a node fails.

Note

If you are unfamiliar with LINQ, please see LINQ: .NET Language-Integrated Query on MSDN. You may also be interested in the resources listed on the .NET Framework Developer's Center LINQ page.