Describe Azure Cosmos DB

Completed

Your development team has experience working with nonrelational data stores. You want to use this experience to extend the functionality of the cloud-native solution to include the processing and storage of IoT telemetry generated by smart appliances. After exploring Azure-managed NoSQL offerings, you decided to use Azure Cosmos DB. The following information can help you confirm its suitability as a persistent data store for telemetry data.

What is Cosmos DB?

Azure Cosmos DB is a fully managed, cloud-native NoSQL database. It's one of the Azure foundational services, which means it's available in every Azure region.

As a managed service, Azure Cosmos DB eliminates most traditional database administrative tasks, such as updates or patching of the underlying database engine. It offers automatic and instant scalability, with Service Level Agreement (SLA)-backed guarantees for its performance and responsiveness. It also provides a set of resiliency features that distinguish it from relational databases. These features include globally distributed replicas with multiple-region writes and the ability to implement five different consistency models, ranging from strong to eventual.

Another unique characteristic of Azure Cosmos DB is support for multiple database APIs. When you provision an Azure Cosmos DB, you can choose your preferred development platform from among the native Core (NoSQL) API, API for MongoDB, Cassandra API, Gremlin API, and Table API. With the Azure Cosmos DB for NoSQL API, you have the flexibility of selecting your preferred development platform, such as the .NET SDK, Java SDK, Node.js, or Python.

What are the advantages of Cosmos DB over relational databases?

One of the common characteristics of relational database systems is the use of locking, which guarantees their transactional behavior. These guarantees help ensure strong data consistency within each database. While such consistency is desired in many scenarios, it has a negative effect on concurrency, latency, and availability. It's possible to mitigate these negative implications by splitting a database into multiple shards, but this approach is complex to implement and maintain.

Azure Cosmos DB addresses these drawbacks through a combination of its support for different consistency models, built-in replication, and multiple-region writes with a configurable conflict resolution mechanism. This support provides significant performance and resiliency benefits in scenarios where strong consistency isn't a requirement. At the same time, Cosmos DB also supports server-side transactions, if such consistency is necessary.

What is the Cosmos DB resource model?

To implement Azure Cosmos DB, you need to first create an Azure Cosmos DB account in your Azure subscription. The account serves as the unit of distribution and high availability. You have the option of configuring an account to replicate across multiple regions and make each one of these replicas writeable. You can also configure the default consistency level for an account.

When you use SQL API, API for MongoDB, or Gremlin API, an account can contain one or more databases, with each of them hosting one or more containers. A container is the unit of scalability, allowing you to designate compute and storage resources for processing its content. That content, if there's SQL API or API for MongoDB, takes the format of JSON-formatted documents, referred to as items, without any specific schema-defined constraints. By default, Azure Cosmos DB automatically indexes all items in a container without requiring explicit index or schema management, but gives you the option of customizing the indexing behavior.

The number of resources available to process data within a database or its individual collections depends on the number of available Request Units (RUs). The number of RUs is based on the database or container configuration that you specify. Cosmos DB offers three modes that determine RU allocation, depending on your preferences.

  • Provisioned throughput mode. In this mode, you designate a specific number of RUs to reflect the expected usage patterns. This approach offers the most clarity about the resulting performance and cost.
  • Autoscale mode. In this mode, you preallocate the number of RUs that you consider to be sufficient to address your baseline requirements, but allow for their automatic increase if there's higher demand for data access. This mode is most suitable for mission-critical workloads with variable or unpredictable usage patterns.
  • Serverless mode. In this mode, you don't need to preallocate RUs. Instead, you rely on the autoscaling capabilities of Azure Cosmos DB to increase or decrease the amount of processing resources. This mode might be beneficial from a cost standpoint, if your workloads can tolerate temporary latency following periods of database inactivity.

What are the benefits and use cases of Cosmos DB in Azure IoT scenarios?

Azure Cosmos DB offers many capabilities that make it suitable for IoT scenarios, including:

  • Partitioning. Azure Cosmos DB automatically partitions containers by using the logical partition key that you specify. Partitioning is the core mechanism behind scalability and resiliency of Azure Cosmos DB. By choosing the partition key, you can accommodate IoT scenarios that require the storing and processing of large volumes of device and telemetry data.

    Note

    A logical partition can't exceed 20 GB in size.

  • Time to Live (TTL). With TTL, Azure Cosmos DB can automatically delete items after a period that you designate. This automation simplifies data lifecycle management and lowers cost, because TTL-based deletions don't count towards the RU usage.

  • Change feed. Azure Cosmos DB uses change feed to automatically trigger an action following changes to collection items. This automatic trigger simplifies implementing the common IoT design pattern, which relies on data changes to trigger a corresponding action.

  • Service Level Agreements(SLAs) for performance and resiliency. In IoT scenarios that involve large volumes of streaming data, customers can count on less than 10-ms latencies for the 99th percentile of reads and writes, and 99.999% availability for multiple-region writes.

  • Schema-less databases. Azure Cosmos DB accommodates the storage of various types of telemetry generated by different device models within the same collection, by eliminating schema-based constraints.

  • Automatic indexing. Azure Cosmos DB indexing support contributes to fast and flexible lookups across large volumes of data containing inventories of registered devices and their telemetry.

Azure Cosmos DB accommodates two primary IoT use cases:

  • It stores device telemetry, which facilitates rapid access to telemetry data for visualization, post-processing, and analytics.
  • It stores a device catalog, which accommodates the modeling of IoT devices, entities, and their topology, with each device represented by an item.