Message queues and stream processing

Module
8 Units

Beginner

Developer

Student

Azure

The increase of available data has led to the rise of continuous streams of real-time data to process. Learn about different systems and techniques for consuming and processing real-time data streams.

Learning objectives

In this module, you will:

Define a message queue and recall a basic architecture
Recall the characteristics, and present the advantages and disadvantages, of a message queue
Explain the basic architecture of Apache Kafka
Discuss the roles of topics and partitions, as well as how scalability and fault tolerance are achieved
Discuss general requirements of stream processing systems
Recall the evolution of stream processing
Explain the basic components of Apache Samza
Discuss how Apache Samza achieves stateful stream processing
Discuss the differences between the Lambda and Kappa architectures
Discuss the motivation for the adoption of message queues and stream processing in the LinkedIn use case

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Prerequisites

Understand what cloud computing is, including cloud service models and common cloud providers
Know the technologies that enable cloud computing
Understand how cloud service providers pay for and bill for the cloud
Know what datacenters are and why they exist
Know how datacenters are set up, powered, and provisioned
Understand how cloud resources are provisioned and metered
Be familiar with the concept of virtualization
Know the different types of virtualization
Understand CPU virtualization
Understand memory virtualization
Understand I/O virtualization
Know about the different types of data and how they're stored
Be familiar with distributed file systems and how they work
Be familiar with NoSQL databases and object storage, and how they work
Know what distributed programming is and why it's useful for the cloud
Understand MapReduce and how it enables big-data computing
Understand Spark and how it differs from MapReduce
Understand GraphLab and how it differs from MapReduce and Spark