Using Microsoft Sho in F#
Applies to: Functional Programming
Authors: Yin Zhu
Summary: Microsoft Sho is a numerical computing platform with support for linear algebra, optimization, and visualization. This article gives an introduction to Sho and shows how to call its .NET libraries from F#.
This topic contains the following sections.
- Introducing Microsoft Sho
- Referencing Sho Libraries from a Script
- Using Sho Vectors and Matrices
- Calling Linear Algebra Operations
- Visualizing Data Using Sho
- Additional Resources
- See Also
This article is associated with Real World Functional Programming: With Examples in F# and C# by Tomas Petricek with Jon Skeet from Manning Publications (ISBN 9781933988924, copyright Manning Publications 2009, all rights reserved). No part of these chapters may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, electrostatic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.
Introducing Microsoft Sho
Sho is a set of libraries and tools for numerical computing based on .NET. It is a project developed at Microsoft Research. The main scripting language used by Sho is IronPython and the programming interface is reminiscent of Matlab or R. The scripting layer based on IronPython is a light layer on top of comprehensive Sho libraries. The libraries implement a wide range of numerical computing functionality including linear algebra, statistics, optimization, and visualization. These libraries are standard .NET libraries and most of them are written in C#, thus it is very convenient to use them from other languages for the .NET Framework. Scripting languages such as F# can also provide a very convenient scripting environment similar to the IronPython console distributed in Sho.
Notable features of Sho include:
**Vector and matrix types—**Sho has full support for dense and sparse matrices. All the matrix types support various kinds of matrix slicing and allow creating shallow copies and views of an existing matrix.
**Intel MKL support—**Sho uses part of Intel MKL for high-performance linear algebra.
**Support for other Microsoft projects—**Sho has wrappers for various Microsoft products such as Microsoft Chart Controls, Microsoft Solver Foundation, and Microsoft Bioinformatics Foundation. This list is expected to grow in the future.
By integrating all of these libraries into a single environment, Sho provides a comprehensive numerical computing platform for the .NET framework. This article shows how to use Sho libraries from F# in general. Then, it focuses on the use of Sho libraries for linear algebra and visualization.
Referencing Sho Libraries from a Script
Microsoft Sho can be downloaded from the Sho project website. After downloading and installing Sho, make sure Sho is correctly installed and find the root of its installation folder. The installation folder will be needed when using Sho from F#.
Sho uses unmanaged dynamically linked libraries (DLLs) that cannot be loaded using the usual mechanism used for.NET Framework libraries. Most importantly, Sho includes parts of Intel Math Kernel Library (MKL) to perform high-performance linear algebra operations. Intel MKL is a native, multi-threaded library, which means operations such as matrix multiplication could be parallelized utilizing the full potential of multi-core CPUs.
The following steps describe how to use Sho libraries from an F# script. The setting for a compiled F# project is similar.
Referencing Sho Libraries from an F# Script
Set the Sho environment variable so that necessary runtimes (such as Intel MKL) can be loaded when needed. To do that, set the SHODIR variable to the path containing native DLL libraries. For .NET Framework 4.0, the default installation folder is C:\Program Files (x86)\Sho 2.0 for .NET 4. The environment variable can be set in the system or from an F# script.
Reference the managed Sho libraries. The main libraries that will be used in this article are MathFunc.dll, MatrixInterf.dll, ShoArray.dll, ShoViz.dll, and PythonExt.dll.
Open Sho namespaces. The root namespace used by Sho is ShoNS. The namespaces used in this article will be discussed further in the chapter.
The following F# code shows this procedure, which is usually included in an F# Script File (.fsx) prior to any code's actual use of the Sho library:
// Set SHODIR environment variable let dir = @"C:\Program Files (x86)\Sho 2.0 for .NET 4" System.Environment.SetEnvironmentVariable("SHODIR",dir) // Reference Sho runtime #r "MathFunc.dll" #r "MatrixInterf.dll" #r "ShoArray.dll" #r "ShoViz.dll" #r "PythonExt.dll" // Open Sho namespaces open ShoNS.Array open ShoNS.Visualization open ShoNS.MathFunc // Add pretty printers for Sho types fsi.AddPrinter (fun (da:DoubleArray) -> da.ToString()) fsi.AddPrinter (fun (fa:FloatArray) -> fa.ToString()) fsi.AddPrinter (fun (ia:IntArray) -> ia.ToString()) fsi.AddPrinter (fun (ca:ComplexArray) -> ca.ToString()) fsi.AddPrinter (fun (dr:DoubleRange) -> dr.ToString())
The listing starts by programmatically setting the SHODIR environment variable so that Sho can locate unmanaged libraries when used. Next, it references the core Sho libraries and opens namespaces containing functionality for linear algebra and charting. Finally, the AddPrinter function is used to register pretty printers for F# Interactive. When an expression returns a DoubleArray value as the result, F# Interactive will uses the pretty printer to output a nice textual representation of the data.
Figure 1. The API structure of the Sho toolkit
As shown in Figure 1, Sho consists of three parts:
Sho Core Libraries. Sho Core libraries mainly contain the types representing matrices. For the element-type, five basic types are supported: boolean, int, single, double, and complex. Beside the dense matrix type, Sho also has full support for the sparse matrix type. Sho uses part of Intel MKL for its linear algebra operations. Aside from matrices and linear algebra, Sho Core also has support for IO, object serialization, and visualization.
Packages for External Libraries. The power of Sho also lies in its external libraries, which include other Microsoft products: Microsoft Solver Foundation and Microsoft Bioinformatics Foundation. The Sho team also releases small libraries for signal processing, statistics, utility functions for Azure, and high-performance computing.
Tools & Languages. The Sho Console is an interactive environment used when writing Sho scripts in IronPython. However, Sho libraries can be accessed by any .NET language, including F#.
This article focuses on the Sho core libraries. The libraries export functionality in the following namespaces:
ShoNS.Array contains classes representing vectors and matrices and the basic operations for working with them, such as slicing.
ShoNS.DB contains a small library for accessing data in database.
ShoNS.IO contains functions to read and write CSV and general delimited files.
ShoNS.Pickling provides functionality for data object serialization.
ShoNS.PythonExtensions contains mainly Python-specific functions for accessing matrices and vectors.
ShoNS.Visualization contains a wrapper for charting using Microsoft Chart Controls.
The following two sections introduce ShoNS.Array and ShoNS.Visualization using F#. Information about other parts can be found in the official Sho documentation referenced at the end of this article. After reading the examples below, it should be easy to translate other examples from the IronPython (used in the Sho documentation) to F#.
Using Sho Vectors and Matrices
Sho includes well-designed programming interfaces for working with vectors and matrices. The design is partly inspired by Matlab. In particular, both double vectors and matrices are represented as a single type named DoubleArray. The following listing shows several examples:
open System let rnd = new Random(1) // Creeate 1x20 vector and fill it with random values let vec = new DoubleArray(20) vec.FillRandom(rnd) // Create a two-dimensional matrix let matrix = new DoubleArray(20, 20) matrix.FillRandom(rnd) // Create a multi-dimensional array let threeD = new DoubleArray [| 3;10;10 |] threeD.FillRandom(rnd)
The listing assumes that the script contains the necessary references to Sho libraries that were shown earlier. Vectors and matrices can be created using the overloaded constructor of DoubleArray. Using a single argument creates a one-dimensional vector, and two arguments can be used to create a 2D matrix. Finally, it is also possible to create an array of arbitrary dimensions using an array of lengths. The example creates an array 3 × 10 × 10 and fills it with random numbers.
Sho also allows you to slice matrices, which provides you with a view of a part of the matrix without the need to copy any of the elements. The following code takes the sixth column of a matrix. The snippet first defines an operator (--) that makes it easier to create slices (formed by two Nullable<'T> values that define the range):
let (--) a b = Slice(Nullable(a), Nullable(b)) // Take rows from 0 to 19 and a single column at index 5 let column = matrix.GetSlice [| 0 -- 19; 5 -- 5 |]
A slice of an array can be obtained by giving an array of Slice values to the GetSlice method. Using null value as one of the arguments of Slice creates a slice that includes all of the remaining or all of the previous columns or rows. The column value created in the snippet has the type DoubleArray and it provides a view of the data stored in the matrix value. This means that modifying column also modifies the original matrix value. A standalone copy can be created using the CopyDeep method:
let column = arr2.GetSlice( [| 0 -- 19; 5 -- 5 |] ).CopyDeep()
Calling Linear Algebra Operations
The linear algebra library in Sho is licensed from Intel MKL, which is highly optimized for Intel and AMD processors. The following code shows an example that multiplies a matrix with a transposed version of itself:
// Initialize a big matrix let data = new DoubleArray(1200,20000) data.FillRandom(rnd) let transposed = data.Transpose() let multipled = transposed * data // Performance using multiple threads: // Real: 00:00:01.745, CPU: 00:00:10.389, GC gen0: 0, gen1: 0, gen2: 0
The performance of the code can easily be measured using the #time directive. When executed on a 64-bit server with 12 cores, the actual time to run the above code was 1.745 seconds, while the total CPU time was 10.389 seconds. This means that the MKL library provided approximately six times the speed on this single run by utilizing a multicore CPU.
Thanks to Intel MKL, Sho also provides a set of standard matrix factorization procedures. The following snippet shows an example:
open ShoNS.PythonExtensions // Fix the random generate to allow repeatable experiment let rng = new System.Random(1) // Generate a random matrix of size 10 x 10 let a = new DoubleArray(10,10) a.FillRandom(rnd) // Perform SVD decomposition let s = new SVD(a) // Recover the matrix and calculate the difference let a' = s.U * s.D * s.V.T let diff = ArrayOps.Norm(a - a', 1.0)
The snippet creates a random array and then performs the Singular Value Decomposition (SVD) using the SVD class. The original matrix can be recovered by multiplying the individual components of the decomposition (U, D matrices and the transpose of the V matrix). To check the difference, the snippet uses the Norm function to calculate the 1-norm of the difference matrix, which is the sum of the absolute values of the difference matrix. After running the above snippet, the value of diff is: 1.443983821e-14, which is close to zero, indicating that the decomposed terms can recover the original matrix.
Visualizing Data Using Sho
The recommended approach to visualizing data in F# is to use the FSharpChart wrapper for Microsoft Chart Controls that has been directly designed for F#. More information about the library can be found in Overview: Getting Started with the FSharpChart Library.
However, the Sho toolkit provides an alternative wrapper that can be used as well. This may be a suitable approach when the Sho library is already used for other calculations. For example, the following snippet shows how to visualize the result of SVD factorization from the previous section (the example uses a value s, which is a result of SVD decomposition from the previous section):
// Draw line chart with circles for each value ShoPlotHelper.Plot(s.D.Diagonal, "ro-") // Draw bar chart ShoPlotHelper.Bar(s.D.Diagonal)
The snippet creates two plots showing the diagonal of the D matrix created by the factorization. The diagonal of the matrix contains singular values of the decomposition. The first command uses a universal Plot function. In the argument string, r specifies that the plot should be red, o specifies the symbol at each point, and - specifies that the points should be connected by lines. The second command uses the Bar function to create a simple bar chart. Both of the charts are shown in Figure 2.
Figure 2. Singular values of the SVD decomposition
This article introduced several features of the Microsoft Sho toolkit and demonstrated how to use them from F#. Sho is a recent numerical toolkit that is being actively developed inside Microsoft Research. It builds on successful products such as Microsoft Solver Foundation and Intel MKL, and so it provides a set of well-tested and efficient libraries. As demonstrated in this article, the implementation of linear algebra functions, such as matrix multiplication, can also take advantage of modern multicore CPUs. The library can be used very easily from F#, and so F# provides an attractive alternative to the IronPython language that is used by default in Sho.
This article introduced the Sho library and explained how to use it from F#. Sho is just one of several libraries that can be used to write numerical computations and charting in F#. The following articles review several other options:
As a functional programming language, F# differs in many ways from languages like R, Matlab, or Python. The following articles discuss the key functional concepts and their effect when implementing parallel computations:
The following documents from Microsoft Research contain more information about the Sho toolkit:
Sho: the .NET Playground for Data homepage of the Sho project containing downloads, examples, and documentation.
The Book of Sho provides a complete documentation of the Sho toolkit, including the IronPython language as well as the libraries usable from F#.
To download the code snippets shown in this article, go to http://code.msdn.microsoft.com/Chapter-4-Numerical-3df3edee
This article is based on Real World Functional Programming: With Examples in F# and C#. Book chapters related to the content of this article are:
Book Chapter 10: “Efficiency of data structures” explains how to write efficient programs in F#. The chapter covers advanced functional techniques and explains how and when to use arrays and functional lists.
Book Chapter 12: “Sequence expressions and alternative workflows” contains detailed information on processing in-memory data (such as lists and seq<'T> values) using higher-order functions and F# sequence expressions.
Book Chapter 13: “Asynchronous and data-driven programming” explains how asynchronous workflows work and uses them to write an interactive script that downloads a large dataset from the Internet.
Previous article: Overview: Numerical Libraries for F# and .NET Framework
Next article: Tutorial: Using Math.NET Numerics in F#