使用 Cosmos DB 的 IoT

Cosmos DB
Databricks
函式
IoT 中樞
Power BI

針對各式各樣無法預測的 IoT 工作負載進行立即且彈性的調整,不必犧牲內嵌或查詢的效能。Scale instantly and elastically to accommodate diverse and unpredictable IoT workloads without sacrificing ingestion or query performance.

Azure Cosmos DB 是 Microsoft 全球發行的多模型資料庫。Azure Cosmos DB is Microsoft’s globally distributed, multi-model database. 全新打造的 Azure Cosmos DB 具備全域散發功能,且可依其核心進行水平調整。Azure Cosmos DB was built from the ground up with global distribution and horizontal scale at its core. 不論您的使用者身在何處,都可以透明調整及複寫您的資料,以周全地全域散發到任何數目的 Azure 區域。It offers turnkey global distribution across any number of Azure regions by transparently scaling and replicating your data wherever your users are. 您可以在世界各地彈性地調整輸送量和儲存體規模,並且只支付所需輸送量和儲存體的費用。You can elastically scale throughput and storage worldwide, and pay only for the throughput and storage you need.

Cosmos DB 最適合用於 IoT 解決方案。Cosmos DB is ideally suited for IoT solutions. Cosmos DB 可以以較高的速率內嵌裝置遙測資料,並可提供低延遲和高可用性的索引查詢。Cosmos DB can ingest device telemetry data at high rates and can serve indexed queries back with low latency and high availability.

Cosmos DB 是多模型資料庫,具有適用于 Cassandra、MongoDB、SQL、Gremlin、Etcd 和資料表的有線通訊協定相容 API 端點,以及內建的 Jupyter Notebook 檔案支援。Cosmos DB is a multi-model database with wire protocol–compatible API endpoints for Cassandra, MongoDB, SQL, Gremlin, Etcd, and Table along with built-in support for Jupyter Notebook files.

架構Architecture

架構

資料流程Data Flow

  1. 從 IoT 裝置產生的事件會透過 Azure IoT 中樞傳送至「分析」和「轉換」層,以作為訊息串流。Events generated from IoT devices are sent to the analyze and transform layer through Azure IoT Hub as a stream of messages. Azure IoT 中樞會在可設定的時間內將資料的資料流程儲存在分割區中。Azure IoT Hub stores streams of data in partitions for a configurable amount of time.
  2. Azure Databricks,執行 Apache Spark 串流時,會從 IoT 中樞即時收取訊息、根據商務邏輯處理資料,並將資料傳送至服務層以進行儲存。Azure Databricks, running Apache Spark Streaming, picks up the messages in real time from IoT Hub, processes the data based on the business logic and sends the data to Serving layer for storage. Spark 串流可以提供即時分析,例如計算移動平均、最小值和最大值(以一段時間為單位)。Spark Streaming can provide real time analytics such as calculating moving averages, min and max values over time periods.
  3. 裝置訊息會以 JSON 檔的形式儲存在 Cosmos DB 中。Device messages are stored in Cosmos DB as JSON documents. 這會被視為 熱資料存放區This is considered the hot data store. 代表不同裝置廠商的不同 JSON 架構可以儲存在 Cosmos DB 或轉換成標準 JSON 架構。Different JSON schemas representing different device vendors can be stored in Cosmos DB or converted to a canonical JSON schema.
  4. 儲存層是由下列各項所組成:The storage layer consists of:
    • Azure Blob 儲存體-IoT 中樞 訊息路由 可將未經處理的 iot 裝置訊息儲存至 Azure Blob 儲存體,讓 Blob 儲存體可作為較便宜的長期 冷資料存放區Azure Blob Storage - IoT Hub message routing can save the raw IoT device messages to Azure Blob storage, allowing blob storage to act as an inexpensive, long-term cold data store.
    • Azure SQL Database-利用 Azure SQL 來儲存您的交易和關聯式資料 (例如,計費資料、使用者角色) 。Azure SQL Database - Utilize Azure SQL for storing your transactional and relational data (for example, billing data, user roles).
    • Azure Synapse Analytics (之前的 Azure SQL 資料倉儲) -適用于您的解決方案資料倉儲。Azure Synapse Analytics (Previously Azure SQL Data Warehouse) - For your solution data warehouse. 使用 Azure Data Factory Cosmos DB 和 Azure SQL 中的匯總資料來填入它。Populated it using Azure Data Factory using aggregated data from Cosmos DB and Azure SQL.
  5. 您的使用者可以使用 Microsoft Power BI 來分析資料倉儲資料。Microsoft Power BI can be used by your users to analyze warehoused data.
  6. Web、行動裝置和其他應用程式可以建立在儲存層上。Web, mobile and other applications can be built on the storage layer. 例如,您可以根據協力廠商使用的儲存層資料來公開 Api。For example, you can expose APIs based on the storage layer data for third-party uses.
  7. 在 Cosmos DB 中新增或更新裝置訊息時,請使用 Cosmos DB 變更摘要來執行 Azure 函數。Use Cosmos DB Change Feed to execute an Azure Function each time a device message is added or updated in Cosmos DB.
  8. 某些裝置訊息 (例如,錯誤代碼) 可能需要在裝置上執行動作。Some device messages (for example, a fault code) may require an action to be performed on the device. 使用 Azure IoT 中樞服務 API,Azure 函式可連接至 Azure IoT 中樞,並在裝置上執行動作 (例如,使用下列其中一種方式重新開機) :Using the Azure IoT Hub Service API, the Azure Function can connect to Azure IoT Hub and perform an action on the device (for example, reboot) using either:
    • 裝置對應項Device Twins
    • 雲端到裝置的訊息Cloud to Device messages
    • 直接方法Direct Methods

元件Components

此架構會使用下列 Azure 元件:This architecture uses the following Azure components:

  • Azure IoT 中樞 作為雲端閘道,可大規模擷取裝置遙測。Azure IoT Hub acts as the cloud gateway, ingesting device telemetry at-scale. IoT 中樞也支援對裝置進行雙向通訊,讓動作可以從雲端或 Azure IoT Edge 傳送到裝置上。IoT Hub also supports bi-directional communication back to devices, allowing actions to be sent from the cloud or Azure IoT Edge to the device. Azure IoT Edge 可以用來在邊緣執行應用程式,例如機器學習模型。Azure IoT Edge can be used to run applications at the edge, such as machine learning models.
  • 具有 Apache Spark 串流的Azure Databricks位於轉換和分析層。Azure Databricks with Apache Spark Streaming is located in the transformation and analytics layer. Databricks 使用 azure eventhubs-spark_2 .11: 2.3.6 Maven 程式庫連線到 IoT 中樞的事件中樞相容端點。Databricks uses the azure-eventhubs-spark_2.11:2.3.6 Maven library to connect to IoT Hub's Event Hub compatible endpoint. Apache Spark 串流是可調整的容錯串流處理系統,原生支援批次和串流工作負載。Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and streaming workloads.
  • Azure Cosmos DB 是全域散發的多模型資料庫。Azure Cosmos DB is a globally distributed, multi-model database.
    • 一致性層級-Cosmos DB 支援5個一致性層級 (強式、限定過期、會話、一致前置詞、最終) 讓您能夠在讀取一致性與可用性、延遲和輸送量之間做出取捨。Consistency Levels - Cosmos DB supports 5 consistency levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual) allowing you to make the tradeoff between the read consistency vs. availability, latency, and throughput.
    • TTL-Azure Cosmos DB 可讓您在一段時間後自動從容器中刪除專案。TTL - Azure Cosmos DB provides the ability to delete items automatically from a container after a certain time period. 這可讓 Cosmos DB 作為最近資料的經常性存取資料存放區,並將長期資料儲存在 Azure Blob 冷儲存體中。This allows Cosmos DB to act as a hot data store for recent data, with long-term data stored in Azure Blob cold storage.
    • 變更摘要-輸出已排序的檔案清單,這些檔已依修改的順序變更。Change Feed - Outputs a sorted list of documents that were changed in the order in which they were modified. 您可以建立會在您 Azure Cosmos 容器變更摘要中的每個新事件上自動觸發的小型回應式 Azure Functions。You can create small reactive Azure Functions that will be automatically triggered on each new event in your Azure Cosmos container's change feed. 根據 JSON 檔的內容,Azure 函式可連接至 Azure IoT 中樞服務 API,並使用裝置對應項、雲端到裝置訊息或直接方法在裝置上執行動作。Depending on the contents of the JSON document, the Azure Function can connect to Azure IoT Hub Service API and execute an action on the device using Device Twin, Cloud to Device messaging, or Direct Methods.
    • 要求單位 (RU) -是 Azure Cosmos DB 中的輸送量量值。Request Unit (RU) - Is the measure of throughput in Azure Cosmos DB. Ru 是效能和成本的計算單位。RUs are compute units for both performance and cost. 使用 ru,您可以動態擴大和縮小,同時維持可用性,同時優化成本、效能和可用性。With RUs, you can dynamically scale up and down while maintaining availability, optimizing for cost, performance and availability at the same time.
    • 分割-資料分割索引鍵會決定資料在不同分割區中的路由方式,Cosmos DB,且在您的特定案例內容中必須有意義。Partitioning - The partition key is what will determine how data is routed in the various partitions by Cosmos DB and needs to make sense in the context of your specific scenario. IoT 裝置識別碼通常是 IoT 應用程式的「自然」分割區索引鍵。The IoT Device Id is generally the “natural” partition key for IoT applications.
  • Azure SQL Database 是交易式和其他非 IoT 資料的關係資料庫。Azure SQL Database is the relational database for transactional and other non-IoT data.
  • Azure Synapse Analytics 是資料倉儲和報表平臺,其中包含來自 Azure SQL 和 Cosmos DB 的匯總資料。Azure Synapse Analytics is the data warehouse and reporting platform, containing aggregated data from Azure SQL and Cosmos DB. 適用于企業資料倉儲和大型資料分析。For enterprise data warehousing and big data analytics.
  • Power BI 是一套商務分析工具,可用來分析資料及分享見解。Power BI is a suite of business analytics tools to analyze data and share insights. Power BI 可以查詢儲存在 Azure Analysis Services 中的語義模型,也可以直接查詢 Azure Synapse。Power BI can query a semantic model stored in Azure Analysis Services, or it can query Azure Synapse directly.
  • Azure App 的服務 可以用來建立 web 和行動應用程式。Azure App Services can be used to build web and mobile applications. AZURE API 應用程式 可用來根據服務層中儲存的資料,向協力廠商公開資料。Azure API App can be used to expose data to third parties, based on the data stored in the Serving Layer.
  • Azure Functions 可以用來將 IoT 訊息承載轉換 (例如,從二進位檔轉換為 JSON) 或在連接到 Cosmos DB 變更摘要時觸發動作。Azure Functions can be used to translate IoT message payloads (for example, from binary to JSON) or trigger actions when connected to Cosmos DB Change Feed. Azure Functions 是事件驅動的無伺服器計算平臺。Azure Functions is an event-driven serverless compute platform. 無需在雲端中大規模進行額外的安裝、部署和操作,即可在本機建置和偵錯,以及使用觸發程序和繫結來整合服務。Build and debug locally without additional setup, deploy and operate at scale in the cloud, and integrate services using triggers and bindings.

替代方案Alternatives

考量Considerations

  • Cosmos DB 具有 20 GB 的限制 (在過去是單一邏輯分割區的 10gb) 。Cosmos DB has a 20-GB limit (in the past it was 10GB) for a single logical partition. 針對大部分的 IoT 解決方案,此大小已足夠。For most IoT solutions, this size is sufficient. 如果沒有,建議您執行下列其中一項:If not, we recommend either:
    • 將分割區索引鍵設定為人工欄位並指派複合值 (例如,裝置識別碼 + 目前月份和年份) 。Setting the partition key to an artificial field and assign a composite value (for example, Device ID + Current Month and Year). 這可確保值的基數非常高。This will ensure an extremely high cardinality of values.
    • 將舊的 Cosmos DB 資料分層至非經常性儲存體 (例如,Azure Blob 儲存體) 使用 TTL 的組合來自動剪除 Cosmos DB 的資料,並變更摘要以將資料複寫至非經常性存取儲存體。Tier old Cosmos DB data out to cold storage (for example, Azure Blob Storage) using a combination of TTL to automatically prune data from Cosmos DB and change feed to replicate data to cold storage.

後續步驟Next steps

請參閱下列有關 IoT 和 Cosmos DB 的文章。Review the following articles on IoT and Cosmos DB.