Distributed computing on the cloud: GraphLab

Beginner
Developer
Student
Azure

GraphLab is a big data tool developed by Carnegie Mellon University to help with data mining. Learn about how GraphLab works and why it's useful.

Learning objectives

In this module, you will:

  • Describe the unique features in GraphLab and the application types that it targets
  • Recall the features of a graph-parallel distributed programming framework
  • Recall the three main parts in the GraphLab engine
  • Describe the steps that are involved in the GraphLab execution engine
  • Discuss the architectural model of GraphLab
  • Recall the scheduling strategy of GraphLab
  • Describe the programming model of GraphLab
  • List and explain the consistency levels in GraphLab
  • Describe the in-memory data placement strategy in GraphLab and its performance implications for certain types of graphs
  • Discuss the computational model of GraphLab
  • Discuss the fault-tolerance mechanisms in GraphLab
  • Identify the steps that are involved in the execution of a GraphLab program
  • Compare and contrast MapReduce, Spark, and GraphLab in terms of their programming, computation, parallelism, architectural, and scheduling models
  • Identify a suitable analytics engine given an application's characteristics

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Prerequisites

  • Understand what cloud computing is, including cloud service models and common cloud providers
  • Know the technologies that enable cloud computing
  • Understand how cloud service providers pay for and bill for the cloud
  • Know what datacenters are and why they exist
  • Know how datacenters are set up, powered, and provisioned
  • Understand how cloud resources are provisioned and metered
  • Be familiar with the concept of virtualization
  • Know the different types of virtualization
  • Understand CPU virtualization
  • Understand memory virtualization
  • Understand I/O virtualization
  • Know about the different types of data and how they're stored
  • Be familiar with distributed file systems and how they work
  • Be familiar with NoSQL databases and object storage, and how they work
  • Know what distributed programming is and why it's useful for the cloud
  • Understand MapReduce and how it enables big data computing
  • Understand Spark and how it differs from MapReduce