Distributed computing on the cloud: Spark

Beginner

Developer

Student

Azure

Spark is an open-source cluster-computing framework with different strengths than MapReduce has. Learn about how Spark works.

Learning objectives

In this module, you will:

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Understand what cloud computing is, including cloud service models and common cloud providers
Know the technologies that enable cloud computing
Understand how cloud service providers pay for and bill for the cloud
Know what datacenters are and why they exist
Know how datacenters are set up, powered, and provisioned
Understand how cloud resources are provisioned and metered
Be familiar with the concept of virtualization
Know the different types of virtualization
Understand CPU virtualization
Understand memory virtualization
Understand I/O virtualization
Know about the different types of data and how they're stored
Be familiar with distributed file systems and how they work
Be familiar with NoSQL databases and object storage, and how they work
Know what distributed programming is and why it's useful for the cloud
Understand MapReduce and how it enables big data computing