Share via


shanyu

How does Spark determine partitions for an RDD?

The most fundamental data structure in Spark is called RDD (Resilient Distributed Dataset). An RDD...

Author: Shanyu Zhao Date: 05/08/2018

Understanding and Using HDInsight Spark Streaming

There are plenty of blogs and materials out there talking about Spark Streaming. Most of them focus...

Author: Shanyu Zhao Date: 09/18/2015

Performance Tuning for HDInsight Storm and Microsoft Azure EventHubs

Apache Storm is a popular real time data processing framework. Microsoft Azure HDInsight provides a...

Author: Shanyu Zhao Date: 05/14/2015

HDInsight Storm Topology Submission Via VNet

  1. Introduction To submit a Storm topology to an HDInsight cluster, a user can RDP to the headnode...

Author: Shanyu Zhao Date: 10/28/2014

Hadoop Yarn memory settings in HDInsight

(Edit: thanks Mostafa for the valuable feedback, I updated this post with explanation about the...

Author: Shanyu Zhao Date: 07/31/2014