January 2016

Volume 31 Number 1

[Editor's Note]

Go Big or Go Home

By Michael Desmond | January 2016

Michael DesmondBig Data is fast becoming big business, and producing a lot of development activity as a result. Market forecasts from research outfit Wikibon project that spending on Big Data technology and services will reach $49.28 billion in 2018, up from $27.36 billion in 2014.

No surprise, Microsoft is working to empower developers in Big Data scenarios. At the Build conference in San Francisco last April, the company revealed services such as Azure Data Warehouse, Azure Data Lake, and Elastic Databases for Azure SQL Database. As we move into 2016, those efforts are beginning to bear fruit. Which explains why this month’s issue of MSDN Magazine is focused on Big Data technologies and development.

Michael Rys, in his feature, “Making Big Data Batch Analytics Easier Using U-SQL,” explores the new U-SQL language, which combines SQL and C# semantics to support Big Data scenarios. Next, Omid Afnan shows how to collect, analyze and act on continuous streams of data in real time, in his article, “Real-Time Data Analytics for .NET Developers Using HDInsight.” The last article in the set, Gaurav Malhotra’s “Creating Big Data Pipelines Using Azure Data Lake and Azure Data Factory,” walks through building a Big Data pipeline using Azure Data Factory to move Web log data to Azure Data Lake Store, and processing that data using a U-SQL script on the Azure Data Lake Analytics service.

Why Big Data and why now? As Afnan notes, 2015 was a big year for Big Data at Microsoft.

“While we have grown internally to multi-exabyte scale in our Big Data scenarios, we have shipped a lot of new capabilities in Azure,” Afnan says. “Advances in HDInsight and the release of Azure Data Lake make Big Data significantly easier for developers. ADL Analytics and Storage let developers focus on the queries and aggregations at the business level by abstracting away the complexity of the distributed computing clusters and map-reduce architecture underneath.”

He also cites recent investments in Visual Studio tooling to streamline coding and debugging for ADL and HDInsight. Still, Afnan says dev shops face a challenge as they encounter the steep learning curve around the “three V’s” of Big Data—velocity, variety and volume—and the distributed, multi-stage processing models used in the space.

“The ideal starting point would be to find a way to start collecting the raw data of interest into a Big Data store, and then start to explore datasets, find insights and eventually operationalize the process,” he says. “Reducing the ramp-up time can be achieved by reducing the number of things you have to manage. Going with a Big Data platform that has a job abstraction is certainly a good way to avoid floundering and giving up.”

Microsoft aims to help dev shops engaged in the challenge, delivering tools and platforms that both expose and streamline new capabilities. These include, says Afnan, new developer tools for “more complex debug scenarios for individual node/vertex failures, exploration of data skew issues, and analysis of multiple runs of the same script.”

There’s a lot to be excited about in the Big Data space. I expect we’ll be seeing plenty more about Big Data in our pages over the months and years to come.

Michael Desmondis the Editor-in-Chief of MSDN Magazine.