Message queues and stream processing

Beginner
Developer
Student
Azure

The increase of available data has led to the rise of continuous streams of real-time data to process. Learn about different systems and techniques for consuming and processing real-time data streams.

Learning objectives

In this module, you will:

  • Define a message queue and recall a basic architecture
  • Recall the characteristics, and present the advantages and disadvantages, of a message queue
  • Explain the basic architecture of Apache Kafka
  • Discuss the roles of topics and partitions, as well as how scalability and fault tolerance are achieved
  • Discuss general requirements of stream processing systems
  • Recall the evolution of stream processing
  • Explain the basic components of Apache Samza
  • Discuss how Apache Samza achieves stateful stream processing
  • Discuss the differences between the Lambda and Kappa architectures
  • Discuss the motivation for the adoption of message queues and stream processing in the LinkedIn use case

In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Prerequisites

  • Understand what cloud computing is, including cloud service models and common cloud providers
  • Know the technologies that enable cloud computing
  • Understand how cloud service providers pay for and bill for the cloud
  • Know what datacenters are and why they exist
  • Know how datacenters are set up, powered, and provisioned
  • Understand how cloud resources are provisioned and metered
  • Be familiar with the concept of virtualization
  • Know the different types of virtualization
  • Understand CPU virtualization
  • Understand memory virtualization
  • Understand I/O virtualization
  • Know about the different types of data and how they're stored
  • Be familiar with distributed file systems and how they work
  • Be familiar with NoSQL databases and object storage, and how they work
  • Know what distributed programming is and why it's useful for the cloud
  • Understand MapReduce and how it enables big-data computing
  • Understand Spark and how it differs from MapReduce
  • Understand GraphLab and how it differs from MapReduce and Spark