아티클
07/01/2015

May 2014

Volume 29 Number 5

Azure Insider : Microsoft Azure and Open Source Power Grid Computing

Bruno Terkaly, Ricardo Villalobos | May 2014

Bruno Terkaly and Ricardo Villalobos Imagine building your own grid computing platform that leverages Microsoft Azure and a large number of connected devices. The goal is to leverage the excess computing power found in modern browsers, sending each client a small amount of JavaScript code and data to perform a computing job. Upon completion, each device connected to this grid sends the results back to a central server residing in Azure.

There’s something like this actually in place—the Search for Extra Terrestrial Intelligence (SETI) project. The search for extraterrestrial life uses a large-scale grid or distributed computing over the Internet. It monitors space for signs of transmissions from alien civilizations by analyzing electromagnetic radiation in the microwave spectrum. It’s a good example of the power of grid computing.

General-Purpose Grid

In this month’s column, we’ll create a more general-purpose grid computing system. This will let us send specific code and data we want executed on each grid node. For this project, each client browser will receive a chunk of JavaScript along with the information to be processed. This lets us more precisely control the task executed in the browser. The example we’ll present solves many general-purpose computing problems that might come up in the context of grid computing.

The genesis of this work came out of the participation of Microsoft in one of the world’s largest hackathons, Tech Crunch Disrupt 2013. Microsoft took third place out of 280 teams. You can see the entire solution at tcrn.ch/OkIchx.

The challenge at a competition like this is you only have two days to complete a project before the judges come in and shoot you down. Besides dealing with sleep deprivation, you have to leverage as many prebuilt components as possible to complete the project on time. Most, if not all, of the technology used in the competition was based on open source software running in Azure. The open source technologies used included Jade, Express, Socket.io, Bootstrap, jQuery and Node.js.

Web Sockets

We relied heavily on the now ubiquitous Web Sockets standard. Web Sockets are part of the HTML5 initiative. They provide a full duplex bidirectional connection over which you can transmit messages between client and server. Web Sockets enable a standardized approach for the server to send content to the browser without being explicitly asked by the client.

This let us exchange messages back and forth while keeping the connection open—creating full communication and orchestration, which is a necessary capability for a grid computing system. Today’s modern browsers such as Firefox 6, Safari 6, Google Chrome 14, Opera 12.10 and Internet Explorer 10 (and later) universally support Web Sockets.

Role of Web Sockets

Web Sockets start working when the client sends a Web Socket handshake request to the server in the form of an HTTP GET request. With Web Sockets, what follows the handshake doesn’t conform to the standard HTTP protocol. Data text frames in full duplex are sent back and forth, with each text frame representing a payload accompanied by a small header. You can split larger messages across multiple data frames.

The Web Socket plumbing tries to detect if there’s a user agent configured, which would let you establish a persistent communication tunnel. In our implementation, the user agent is simply a field in the HTTP header used to send a special HTTP request that basically says, “Switch to Web Sockets.” In this article, we’ll use Web Sockets to send executable JavaScript and data to each Web client. Once the job is complete, we’ll use the Web Sockets to send computational results back to the Node.js server. This is the key part of our architecture we’ll explain later.

Running the entire project yourself is quite easy. You can view a brief video that shows the project in action at 1drv.ms/1d79pjo. Before watching the video, you can grab all the code from GitHub at bit.ly/1mgWWwc. Setting up the project to run is straightforward with Node.js:

Start by installing Node.js from nodejs.org
Install Git (git-scm.com) or GitHub (github.com)
Clone your fork with a Git clone (bit.ly/1cZ1nZh)
Install the Node.js package manager (NPM) in the cloned directory
Start NPM to run

You’ll need to install the various Node.js packages highlighted in this column. You can download the packages using the NPM at npmjs.org. You can also learn how to install them with a right-click in Visual Studio at bit.ly/OBbtEF. To learn more about using Visual Studio with Node.js, check out Bruno’s blog post, “Getting Started with Node.js and Visual Studio” (bit.ly/1gzKkbj).

Focus on App.js

The final solution we created actually has two server-side processes. The first and most obvious server-side process is the one that’s breaking the large computing job into smaller pieces and distributing the work and data to connected client browsers. You’ll find that code in App.js.

There’s a second server-side process that provides a portal experience to managing and viewing the large computing jobs executing on the grid. You’ll find that code in Server.js. It provides a real-time dashboard experience, complete with live updating graphs and numbers through a browser (see Figure 1). Our column will focus on the App.js code.

Figure 1 High-Level Grid Architecture

Orchestration Details

Node.js provides some surprisingly powerful abstractions that help you assemble an elegant implementation. First, you need to solve the problem of sending a piece of JavaScript code you wish to execute as part of the large grid job. You also need to send some data the Javascript code will use.

You can use the Node.js package Express and Socket.io to accomplish this. It’s not enough to just send just one piece of JavaScript code and data to the browser. You still need a way to execute the code against the data and send the result back to the server. You can resolve this with the Index.jade module. This means there’s a second piece of JavaScript code to manage executing the grid code itself.

Three node packages (along with some supporting packages) greatly simplify implementing this architecture. For example, the express package is a popular package that helps with URL routes, handling requests and views. It also simplifies things such as parsing payloads, cookies and storing sessions.

Another powerful package is Socket.io, which abstracts away Web Sockets and includes convenient features such as broadcasts and multicasts. Socket.io lets you set up bidirectional communication using syntactically identical JavaScript code on both the server and the browser. Socket.io manages the JavaScript that runs in the browser and the server. This is precisely what we believe makes Node.js great. There’s no mental context switching with writing JavaScript that runs on the server versus the client.

Node.js is tightly integrated with Jade, which streamlines the process of creating a Web interface. Jade provides a template-based approach to creating the HTML, in addition to containing the orchestration JavaScript code that manages the communication between server and browser (client).

Taken together, all of the packages referenced in Figure 2 will dramatically reduce the amount of code you have to write. A good Node.js developer understands the language and the built-in capabilities. A great Node.js developer is familiar with the various packages and is skilled at using them efficiently. Do yourself a favor and familiarize yourself with the Node.js Packaged Modules library at npmjs.org.

Figure 2 The Bidirectional Communication of Grid Architecture

Bidirectional Logic

Ultimately, the orchestration between client and server is nothing more than the state machine bidirectional logic. For example, the client might be in a waiting state for the JavaScript code or it might be in a waiting state for the data to be received.

The server side will have corresponding states, such as the sending JavaScript state or the sending data state. You’ll notice statements in the Node.js code, such as “Socket.on(“some state”),” indicating the server is waiting to receive a magic string to trigger a state change (see Figure 3). Then it can respond appropriately to that event.

Figure 3 Partial Listing in App.js for Setting up Node.js Packages

// Setup libraries.
var express = require('express');
var routes = require('./routes');
var user = require('./routes/user');
var http = require('http');
var path = require('path');
var socketio = require('socket.io');
var app = express();
var azure = require('azure');
var fs = require('fs')
// Code omitted for brevity.
// Let Jade handle the client-side JavaScript and HTML
app.set('views', __dirname + '/views');
app.set('view engine', 'jade');

Let’s begin by examining the setup code for the Node.js server-side process. The workflow begins when the server opens a port and waits for connections. Both Express and Socket.io let the server listen for incoming connections of browsers on port 3,000:

// Create a Web server, allowing the Express package to
// handle the requests.
var server = http.createServer(app);
// Socket.io injects itself into HTTP server, handling Socket.io
// requests, not handled by Express itself.
var io = socketio.listen(server);

Send JavaScript to the Browser

Once the connection is established, the server waits for a message from the client that indicates the client is ready to receive some JavaScript code through the Web Socket connection. The code in line nine in Figure 4 represents the server waiting for the connection to take place and for the client to send the string “ready for job,” indicating to the server the JavaScript should be sent back to the client.

Figure 4 Partial Listing of Server-Side Code That Distributes JavaScript and Data to Browsers on the Grid

(001) // Code Part 1
(003) // Wait for the browser to say it’s ready for the job.
(005) // If it is, send the JavaScript to the grid node for execution.
(007) // Do the same thing for the data being sent to the browser.
(009) io.on('connection', function(socket) {
(011)   socket.on('ready for job', function() {
(013)     clients++;
(015)     socket.emit('job', 'function process(message){function isInRange(origin,target,range){function toRad(deg){return deg*Math.PI/180}function getDistance(origin,target){var R=6371;var delta={lat:toRad(target.lat-origin.lat),lon:toRad(target.lon-origin.lon)};var start=toRad(origin.lat);var end=toRad(target.lat);var a=Math.sin(delta.lat/2)*Math.sin(delta.lat/2)+Math.sin(delta.lon/2)*Math.sin(delta.lon/2)*Math.cos(start)*Math.cos(end);var c=2*Math.atan2(Math.sqrt(a),Math.sqrt(1-a));return R*c}return getDistance(origin,target)<range}function parseData(data){var parts=data.split(",");return{lat:parts[parts.length-1],lon:parts[parts.length-2]}}var target=parseData(message.body);var origin={lat:37.769578,lon:-122.403663};var range=5;return isInRange(origin,target,range)?1:0}');
(017)   });
(018)
(021) // Code Part 2  Sending data to the browser for processing
(023) // when 'ready for data event' fires off.
(025)   socket.on('ready for data', function() {
(027)     socket.isClient = true;
(029)     sendDataToSocket(socket);
(031)   });
(032)
(035) // Code Part 3 - retrieving the results of the computation.
(037) // A more thorough implementation will aggregate all the
      // results from all the browsers to solve the large computational
(039) // problem that has been broken into small chunks for each browser.
(041)   socket.on('results', function(message, results) {
(043)     messageCount++;
(045)     crimesInRange += results;
(047)   });
(048)
(051) // Code Part 4 - A basic method to send data to a connected
      // client with a timeout of 77 ms.
(053) function sendDataToSocket(socket) {
(055)   var data = lines.shift();
(057)   lines.push(data);
(059)   setTimeout(function() {
(061)     // To one client, singular
(063)     socket.emit('process', {
(065)                                        body: data
(067)     });
(069)   }, 77);
(071) }

At this point, we’re halfway there. The client still needs to request the data to process. The code is some basic trigonometric code to calculate distance between two points using GPS coordinates. You could substitute any JavaScript code you want here.

The code in part two represents the state in which the server waits for the string “ready for data” at line 25 in Figure 4. This signals the browser’s request for data. The JavaScript previously sent in part one will process this data. The code in part three represents the state in which the client browser has finished computations on the sent data. When the server receives the string results at line 41, it’s ready to incorporate that browser’s final result for the computing job. At this time, the browser could be sent another job to do more processing, repeating the cycle.

The Jade Engine

Jade is a productive HTML view and templating engine integrated into Node.js. Jade greatly simplifies the markup and JavaScript you write for the browser. Figure 5 shows the Jade markup language that defines the UI.

Figure 5 Jade Defines the UI

// Part 1
// This UI markup gets translated into real HTML
// before running on the client.
block content
  h1= title
  p This is an example client Web site. Imagine a beautiful Web site without any advertisements!
  p This page is processing
    span#items
    |&nbsp; jobs per second.
// Part 2
// This is the client-side JavaScript code.
  script.
    var socket = io.connect();
    var job = function(id, data) {  };
    var createFunction = function(string) {
      return (new Function( 'return (' + string + ')' )());
    }
    var items = 0;
    function calculateWork() {
      $('#items').text(items);
      items = 0;
    }
    setInterval(calculateWork, 1000);
    socket.on('connect', function() {
      socket.emit('ready for job');
    });
    socket.on('job', function(fn) {
      job = createFunction(fn);
      console.log(fn);
      console.log(job);
      socket.emit('ready for data');
    });
    socket.on('process', function(message) {
      var results = job(message);
      items++;
      socket.emit('results', message, results);
      socket.emit('ready for data');
    });

First, it simply shows the job progress to the browser. Second, it takes the JavaScript and data sent by the server. This represents the computational job it needs to perform. It executes the job, returning the results back to the server.

If you’ve ever wondered how to send JavaScript to a browser for execution, Figure 5 represents the code you’ll need to do this. If you want to know more about how Jade works, we recommend this brilliantly simple explanation at jade-lang.com. The bottom line is you can code up a visual interface without all the tricky HTML tags, angle brackets and so on.

There are other aspects of this project we didn’t get the chance to cover. One of the bigger pieces is in Server.js, where the portal experience lives and lets you track the progress of all grid jobs in process. It includes a beautiful UI that’s 100 percent Web-based. It’s a live, constantly updating dashboard, complete with charts and graphs. We also didn’t address the practical aspects of security and the threat of someone hijacking and modifying the JavaScript sent to the client and doing harm.

Wrapping Up

You could adapt all of this for other general-purpose grid computing problems. We think the more important take away from this article is the power and flexibility of Node.js. The repos on GitHub for Node.js exceed that of jQuery, a powerful testimony of how Node.js resonates with today’s modern developer.

We’d like to thank the startup and partner evangelists, whose job it is to help companies and entrepreneurs understand and leverage the Microsoft stack and related technologies, many of which are open sourced. Warren Wilbee, West Region startup manager, seeded the Tech Crunch Disrupt team with some of his top players, including Felix Rieseberg, Helen Zeng, Steve Seow, Timothy Strimple and Will Tschumy.

Bruno Terkaly is a developer evangelist for Microsoft. His depth of knowledge comes from years of experience in the field, writing code using a multitude of platforms, languages, frameworks, SDKs, libraries and APIs. He spends time writing code, blogging and giving live presentations on building cloud-based applications, specifically using the Azure platform. You can read his blog at blogs.msdn.com/b/brunoterkaly.

Ricardo Villalobos is a seasoned software architect with more than 15 years of experience designing and creating applications for companies in multiple industries. Holding different technical certifications, as well as a master’s degree in business administration from the University of Dallas, he works as a cloud architect in the DPE Globally Engaged Partners team for Microsoft, helping companies worldwide to implement solutions in Azure. You can read his blog at blog.ricardovillalobos.com.

Terkaly and Villalobos jointly present at large industry conferences. They encourage readers of Azure Insider to contact them for availability. Terkaly can be reached at bterkaly@microsoft.com and Villalobos can be reached at Ricardo.Villalobos@microsoft.com.

Thanks to the following Microsoft technical experts for reviewing this article: Gert Drapers, Cort Fritz and Tim Park