Distributed computing on the cloud: GraphLab

Beginner

Developer

Student

Azure

GraphLab is a big data tool developed by Carnegie Mellon University to help with data mining. Learn about how GraphLab works and why it's useful.

Learning objectives

In this module, you will:

Describe the unique features in GraphLab and the application types that it targets
Recall the features of a graph-parallel distributed programming framework
Recall the three main parts in the GraphLab engine
Describe the steps that are involved in the GraphLab execution engine
Discuss the architectural model of GraphLab
Recall the scheduling strategy of GraphLab
Describe the programming model of GraphLab
List and explain the consistency levels in GraphLab
Describe the in-memory data placement strategy in GraphLab and its performance implications for certain types of graphs
Discuss the computational model of GraphLab
Discuss the fault-tolerance mechanisms in GraphLab
Identify the steps that are involved in the execution of a GraphLab program
Compare and contrast MapReduce, Spark, and GraphLab in terms of their programming, computation, parallelism, architectural, and scheduling models
Identify a suitable analytics engine given an application's characteristics

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Understand what cloud computing is, including cloud service models and common cloud providers
Know the technologies that enable cloud computing
Understand how cloud service providers pay for and bill for the cloud
Know what datacenters are and why they exist
Know how datacenters are set up, powered, and provisioned
Understand how cloud resources are provisioned and metered
Be familiar with the concept of virtualization
Know the different types of virtualization
Understand CPU virtualization
Understand memory virtualization
Understand I/O virtualization
Know about the different types of data and how they're stored
Be familiar with distributed file systems and how they work
Be familiar with NoSQL databases and object storage, and how they work
Know what distributed programming is and why it's useful for the cloud
Understand MapReduce and how it enables big data computing
Understand Spark and how it differs from MapReduce