了解 Azure 串流分析中的時間處理Understand time handling in Azure Stream Analytics

在此文章中,您將了解如何做出設計決定來解決 Azure 串流分析作業中實際的時間處理問題。In this article, you learn how to make design choices to solve practical time handling problems in Azure Stream Analytics jobs. 時間處理設計決策與事件順序因素有密切的關聯。Time handling design decisions are closely related to event ordering factors.

背景時間概念Background time concepts

為了建構討論基礎,我們將定義一些背景概念:To better frame the discussion, let's define some background concepts:

  • 事件時間:原始事件的發生時間。Event time: The time when the original event happened. 例如,當行駛於高速公路的車輛接近收費站時。For example, when a moving car on the highway approaches a toll booth.

  • 處理時間:事件到達處理系統且被發現的時間。Processing time: The time when the event reaches the processing system and is observed. 例如,當收費站感應器發現車輛,且電腦系統以所需時間處理資料時。For example, when a toll booth sensor sees the car and the computer system takes a few moments to process the data.

  • 浮水印:事件時間標記,可指出已將哪些時間點事件輸入至串流處理器。Watermark: An event time marker that indicates up to what point events have been ingressed to the streaming processor. 浮水印讓系統明確指出擷取事件的進度。Watermarks let the system indicate clear progress on ingesting the events. 依據資料流的特質,內送事件資料永遠不會停止,因此浮水印可指出資料流中某個時間點的進度。By the nature of streams, the incoming event data never stops, so watermarks indicate the progress to a certain point in the stream.

    浮水印是重要的概念。The watermark concept is important. 浮水印可讓串流分析判斷系統何時可產生完整、正確、可重複且不需撤回的結果。Watermarks allow Stream Analytics to determine when the system can produce complete, correct, and repeatable results that don’t need to be retracted. 此處理可透過可預測且可重複的方式完成。The processing can be done in a predictable and repeatable way. 例如,若因某些錯誤處理情況而需要重新計數,浮水印將是安全的起始點和結束點。For example, if a recount needs to be done for some error handling condition, watermarks are safe starting and ending points.

如需此主題的相關資源,請參閱 Tyler Akidau 的部落格文章串流 101 (英文) 和串流 102 (英文)。For additional resources on this subject, see Tyler Akidau's blog posts Streaming 101 and Streaming 102.

選擇最理想的開始時間Choose the best starting time

串流分析有兩個選項可供使用者挑選事件時間:抵達時間與應用時間。Stream Analytics gives users two choices for picking event time: arrival time and application time.

抵達時間Arrival time

事件到達來源時,系統會為輸入來源指派抵達時間。Arrival time is assigned at the input source when the event reaches the source. 您可以針對事件中樞輸入使用 EventEnqueuedUtcTime 屬性、針對 IoT 中樞輸入使用 IoTHub.EnqueuedTime 屬性,以及針對 Blob 輸入使用 BlobProperties.LastModified 屬性來存取抵達時間。You can access arrival time by using the EventEnqueuedUtcTime property for Event Hubs input, the IoTHub.EnqueuedTime property for IoT Hub input, and the BlobProperties.LastModified property for blob input.

系統預設會使用抵達時間,且其最適合用於不需要時態性邏輯的資料封存案例。Arrival time is used by default and is best used for data archiving scenarios where temporal logic isn't necessary.

應用時間 (也稱為事件時間)Application time (also named Event Time)

產生事件時會指派應用時間,且會成為事件承載的一部分。Application time is assigned when the event is generated, and it's part of the event payload. 若要按照應用時間處理事件,請在 SELECT 查詢中使用 Timestamp by 子句。To process events by application time, use the Timestamp by clause in the SELECT query. 如果 Timestamp by 不存在,將按照抵達時間處理事件。If Timestamp by is absent, events are processed by arrival time.

請務必在涉及時態性邏輯的情況下於承載中使用時間戳記,以將來源系統或網路中的延遲納入考量。It's important to use a timestamp in the payload when temporal logic is involved to account for delays in the source system or in the network. 指派給事件的時間會在 SYSTEM.TIMESTAMP (英文) 中提供。The time assigned to an event is available in SYSTEM.TIMESTAMP.

Azure 串流分析中的時間行進方式How time progresses in Azure Stream Analytics

使用應用時間時,時間行進方式會取決於內送事件。When you use application time, the time progression is based on the incoming events. 串流處理系統難以確定是否有事件,或事件是否延遲。It's difficult for the stream processing system to know if there are no events, or if events are delayed. 因此,針對各個輸入分割區,Azure 串流分析會透過下列方式產生啟發式浮水印:For this reason, Azure Stream Analytics generates heuristic watermarks in the following ways for each input partition:

  • 只要有任何內送事件,浮水印就會是串流分析到目前為止看到的最大事件時間,減去順序錯亂容錯時間範圍大小。When there's any incoming event, the watermark is the largest event time Stream Analytics has seen so far minus the out-of-order tolerance window size.

  • 沒有內送事件時,浮水印就是目前的預估抵達時間,減去延遲抵達容錯時間範圍。When there's no incoming event, the watermark is the current estimated arrival time minus the late arrival tolerance window. 預估抵達時間是系統上次看到輸入事件的時間之後所經過的時間,加上該輸入事件的抵達時間。The estimated arrival time is the time that has elapsed from the last time an input event was seen plus that input event's arrival time.

    抵達時間僅可預估,因為真正的抵達時間會由輸入事件的訊息代理程式 (例如事件中樞) 產生,而不是在處理事件的 Azure 串流分析 VM 上產生。The arrival time can only be estimated because the real arrival time is generated on the input event broker, such as Event Hubs, nor on the Azure Stream Analytics VM processing the events.

除了產生浮水印以外,此設計還能提供兩個額外的用途:The design serves two additional purposes other than generating watermarks:

  1. 無論是否有內送事件,系統都會適時產生結果。The system generates results in a timely fashion with or without incoming events.

    您可以控制顯示輸出結果的及時程度。You have control over how timely you want to see the output results. 在 Azure 入口網站中,您可以在串流分析作業的 [事件順序] 頁面上設定 [順序錯亂事件] 設定。In the Azure portal, on the Event ordering page of your Stream Analytics job, you can configure the Out of order events setting. 進行該設定時,請適當取捨時效性與事件資料流中順序錯亂事件的容錯。When you configure that setting, consider the trade-off of timeliness with tolerance of out-of-order events in the event stream.

    延遲抵達容錯時間範圍為持續產生浮水印的必要項目,即便是在沒有內送事件的情況下。The late arrival tolerance window is necessary to keep generating watermarks, even in the absence of incoming events. 有時候,可能會有某段期間沒有內送事件傳入,例如事件輸入資料流較為疏鬆時。At times, there may be a period where no incoming events come in, like when an event input stream is sparse. 如果在輸入事件訊息代理程式中使用了多個分割區,這個問題會更嚴重。That problem is exacerbated by the use of multiple partitions in the input event broker.

    串流資料處理系統若沒有延遲抵達容錯時間範圍,在輸入較疏鬆,且使用了多個資料分割時,可能會發生輸出延遲的狀況。Streaming data processing systems without a late arrival tolerance window may suffer from delayed outputs when inputs are sparse and multiple partitions are used.

  2. 系統行為必須是可重複的。The system behavior needs to be repeatable. 可重複性是串流資料處理系統的重要屬性。Repeatability is an important property of a streaming data processing system.

    浮水印衍生自抵達時間和應用時間。The watermark is derived from the arrival time and application time. 這兩者都會保存在事件訊息代理程式中,因此可重複使用。Both are persisted in the event broker, and thus repeatable. 在沒有事件的情況下預估抵達時間時,Azure 串流分析會在日誌中記錄預估的抵達時間,以針對失敗復原於重新執行期間提供可重複性。When an arrival time is estimated in the absence of events, Azure Stream Analytics journals the estimated arrival time for repeatability during replay for failure recovery.

請注意,當您選擇使用 抵達時間 作為事件時間時,就不需要設定順序錯亂容錯和延遲抵達容錯。When you choose to use arrival time as the event time, there you don't need to configure the out-of-order tolerance and late arrival tolerance. 由於 抵達時間 必定會在輸入事件的訊息代理程式中遞增,因此 Azure 串流分析會直接忽略這些設定。Since arrival time is guaranteed to be increasing in the input event broker, Azure Stream Analytics simply disregards the configurations.

延遲抵達事件Late arriving events

根據延遲抵達容錯時間範圍的定義,針對每個內送事件,Azure 串流分析會將 事件時間抵達時間 比較。By definition of late arrival tolerance window, for each incoming event, Azure Stream Analytics compares the event time with the arrival time. 如果事件時間位於容錯時間範圍之外,您可以設定系統來卸除該事件,或是調整事件的時間來使其位於容錯之內。If the event time is outside of the tolerance window, you can configure the system to drop the event or adjust the event's time to be within the tolerance.

在浮水印產生後,服務有可能收到事件時間低於浮水印的事件。Once watermarks are generated, the service can potentially receive events with an event time lower than the watermark. 您可以將服務設定成會 卸除 那些事件,或將事件的時間 調整 為浮水印值。You can configure the service to either drop those events, or adjust the event's time to the watermark value.

在調整的過程中,事件的 System.Timestamp 會設為新的值,但 [事件時間] 欄位本身不會變更。As a part of the adjustment, the event's System.Timestamp is set to the new value, but the event time field itself is not changed. 只有進行此調整時,事件的 System.Timestamp 才可能與 [事件時間] 欄位中的值不同,而且這種情況可能會導致非預期的結果。This adjustment is the only situation where an event's System.Timestamp can be different from the value in the event time field and may cause unexpected results to be generated.

使用子串流處理時間差異Handle time variation with substreams

在不同事件傳送端之間的時間大致維持同步的情況下,先前所說明的啟發學習法浮水印產生機制大多可正常運作。The heuristic watermark generation mechanism described works well in most of cases where time is mostly synchronized between the various event senders. 但在實際情況下,尤其是許多 IoT 案例中,系統幾乎無法控制事件傳送端的時鐘。However, in real life, especially in many IoT scenarios, the system has little control over the clock on the event senders. 事件傳送端可能是各式各樣的裝置,且各自採用不同版本的軟硬體。The event senders could be all sorts of devices in the field, perhaps on different versions of hardware and software.

與其使用輸入分割區中的所有事件均通用的浮水印,串流分析還有另一個稱為 子串流 的機制。Instead of using a watermark that is global to all events in an input partition, Stream Analytics has another mechanism called substreams. 您也可以撰寫使用 TIMESTAMP BY 子句和關鍵字 OVER 的作業查詢,藉以在您的作業中使用子串流。You can utilize substreams in your job by writing a job query that uses the TIMESTAMP BY clause and the keyword OVER. 若要指定子資料流,請在 OVER 關鍵字後面提供索引鍵資料行名稱 (例如 deviceid),使系統依據該資料行套用時間原則。To designate the substream, provide a key column name after the OVER keyword, such as a deviceid, so that system applies time policies by that column. 每個子串流會分別取得其本身的浮水印。Each substream gets its own independent watermark. 在處理事件傳送端之間的時鐘誤差較大或網路延遲的問題時,此機制將有助於及時的輸出產生。This mechanism is useful to allow timely output generation, when dealing with large clock skews or network delays among event senders.

子串流是 Azure 串流分析所提供的獨特解決方案,其他串流資料處理系統並不提供。Substreams are a unique solution provided by Azure Stream Analytics, and are not offered by other streaming data processing systems.

使用子串流時,串流分析會對內送事件套用延遲抵達容錯時間範圍。When you use substreams, Stream Analytics applies the late arrival tolerance window to incoming events. 延遲抵達容錯會決定不同的子串流可與彼此分隔的最大間隔。The late arrival tolerance decides the maximum amount by which different substreams can be apart from each other. 例如,如果裝置 1 位於時間戳記 1,且裝置 2 位於時間戳記 2,則最為延遲的抵達容錯就是時間戳記 2 減去時間戳記 1。For example, if Device 1 is at Timestamp 1, and Device 2 is at Timestamp 2, the at most late arrival tolerance is Timestamp 2 minus Timestamp 1. 預設設定為 5 秒,且對於具有不同時間戳記的裝置而言很可能太小。The default setting is 5 seconds and is likely too small for devices with divergent timestamps. 建議您一開始先使用 5 分鐘,再根據裝置時鐘的誤差模式進行調整。We recommend that you start with 5 minutes and make adjustments according to their device clock skew pattern.

早期抵達事件Early arriving events

您可能已注意到名為提早抵達時間範圍的另一個概念,其與延遲抵達容錯時間範圍的概念相對應。You may have noticed another concept called early arrival window that looks like the opposite of late arrival tolerance window. 此時間範圍會固定為 5 分鐘,且其用途與延遲抵達容錯時間範圍不同。This window is fixed at 5 minutes and serves a different purpose from the late arrival tolerance window.

由於 Azure 串流分析保證完整結果,因此您只能將 作業開始時間 指定為作業的第一個輸出時間,而非輸入時間。Because Azure Stream Analytics guarantees complete results, you can only specify job start time as the first output time of the job, not the input time. 必須要有作業工作開始時間,才會處理完整的時間範圍,而不是只從時間範圍中間開始處理。The job start time is required so that the complete window is processed, not just from the middle of the window.

串流分析會從查詢規格衍生開始時間。Stream Analytics derives the start time from the query specification. 不過,由於輸入事件的訊息代理程式僅會依據抵達時間編製索引,因此系統必須將開始事件時間轉譯為抵達時間。However, because the input event broker is only indexed by arrival time, the system has to translate the starting event time to arrival time. 系統可以從輸入事件訊息代理程式中的那個時間點開始處理事件。The system can start processing events from that point in the input event broker. 在有提早抵達時間範圍限制的情況下,轉譯會很簡單:開始事件時間減去 5 分鐘的提早抵達時間範圍。With the early arriving window limit, the translation is straightforward: starting event time minus the 5-minute early arriving window. 此計算也表示系統會將事件時間看起來比抵達時間提早 5 分鐘以上的所有事件卸除。This calculation also means that the system drops all events that are seen as having an event time 5 minutes earlier than the arrival time. 提早輸入事件計量會在系統卸除事件時遞增。The early input events metric is incremented when the events are dropped.

此概念可確保無論您從何處開始要輸出,處理程序都是可重複的。This concept is used to ensure the processing is repeatable no matter where you start to output from. 若沒有這類機制,就無法保證可重複性,儘管有許多其他串流處理系統宣稱有此功能。Without such a mechanism, it would not be possible to guarantee repeatability, as many other streaming systems claim they do.

事件順序時間容錯的副作用Side effects of event ordering time tolerances

串流分析作業有數個「事件順序」選項。Stream Analytics jobs have several Event ordering options. 其中有兩個可在 Azure 入口網站中設定:「順序錯亂事件」設定 (順序錯亂容錯),和「延遲抵達的事件」設定 (延遲抵達容錯)。Two can be configured in the Azure portal: the Out of order events setting (out-of-order tolerance), and the Events that arrive late setting (late arrival tolerance). 早期抵達 容錯是固定,且無法調整。The early arrival tolerance is fixed and cannot be adjusted. 串流分析會使用這些時間原則來提供穩固的保證。These time policies are used by Stream Analytics to provide strong guarantees. 不過,這些設定有時候會一些非預期的影響:However, these settings do have some sometimes unexpected implications:

  1. 意外過早傳送事件。Accidentally sending events that are too early.

    早期事件應該不會正常輸出。Early events should not be outputted normally. 但如果傳送端的時鐘太快,就有可能將早期事件傳送至輸出。It's possible that early events are sent to the output if sender’s clock is running too fast though. 所有早期抵達的事件都會被捨棄,因此您在輸出中不會看到任何這類事件。All early arriving events are dropped, so you will not see any of them from the output.

  2. 將舊事件傳送至事件中樞給 Azure 串流分析處理。Sending old events to Event Hubs to be processed by Azure Stream Analytics.

    舊事件在初期可能無害,因為延遲抵達容錯的作用,可能會使舊事件遭到捨棄。While old events may seem harmless at first, because of the application of the late arrival tolerance, the old events may be dropped. 如果事件太舊,在事件擷取期間將會更改 System.Timestamp 值。If the events are too old, the System.Timestamp value is altered during event ingestion. 由於有此行為,Azure 串流分析目前較適用於接近即時的事件處理案例,而不是歷史事件處理案例。Due to this behavior, currently Azure Stream Analytics is more suited for near-real-time event processing scenarios, instead of historical event processing scenarios. 在某些情況下,您可以將「延遲抵達的事件」時間設為最大可能值 (20 天),以因應此問題。You can set the Events that arrive late time to the largest possible value (20 days) to work around this behavior in some cases.

  3. 輸出似乎有延遲的現象。Outputs seem to be delayed.

    第一個浮水印會在下列計算出的時間產生:系統到目前為止觀察到的 最大事件時間,減去順序錯亂容錯時間範圍大小。The first watermark is generated at the calculated time: the maximum event time the system has observed so far, minus the out-of-order tolerance window size. 根據預設,順序錯亂容錯會設定為零 (00 分 00 秒)。By default, the out-of-order tolerance is configured to zero (00 minutes and 00 seconds). 當您將其設為較高的非零時間值時,基於計算出來的第一個浮水印時間,串流作業的第一個輸出將會延遲達該時間值 (或以上)。When you set it to a higher, non-zero time value, the streaming job's first output is delayed by that value of time (or greater) due to the first watermark time that is calculated.

  4. 輸入較疏鬆。Inputs are sparse.

    給定的分割區中沒有輸入時,浮水印時間的計算方式為 抵達時間 減去延遲抵達容錯時間範圍。When there is no input in a given partition, the watermark time is calculated as the arrival time minus the late arrival tolerance window. 因此,如果輸入事件不頻繁且疏鬆,輸出可能會延遲達該時間量。As a result, if input events are infrequent and sparse, the output can be delayed by that amount of time. 「延遲抵達的事件」的預設值為 5 秒。The default Events that arrive late value is 5 seconds. 舉例來說,在逐一傳送輸入事件時,您應該會發現某種程度的延遲。You should expect to see some delay when sending input events one at a time, for example. 如果您將「延遲抵達的事件」時間範圍設為較大的值,延遲可能會更嚴重。The delays can get worse, when you set Events that arrive late window to a large value.

  5. System.Timestamp 值與 [事件時間] 欄位中的時間不同。System.Timestamp value is different from the time in the event time field.

    如前所述,系統就會依據順序錯亂容錯或延遲抵達容錯時間範圍來調整事件時間。As described previously, the system adjusts event time by the out-of-order tolerance or late arrival tolerance windows. 所調整的是事件的 System.Timestamp 值,而不是 [事件時間] 欄位。The System.Timestamp value of the event is adjusted, but not the event time field. 這可用來識別時間戳記所調整的事件。This can be used to identify for which events the timestamps adjusted. 如果系統因其中一個容錯而變更時間戳記,其通常會是相同的。If the system changed the timestamp due to one of the tolerances, normally they are the same.

可觀察的計量Metrics to observe

您可以透過串流分析作業計量來觀察許多事件排序時間容錯效果。You can observe a number of the Event ordering time tolerance effects through Stream Analytics job metrics. 相關計量如下:The following metrics are relevant:

計量Metric 描述Description
順序錯亂事件Out-of-Order Events 指出已收到、但因順序錯亂而遭到捨棄或調整時間戳記的事件數目。Indicates the number of events received out of order, that were either dropped or given an adjusted timestamp. 在 Azure 入口網站中,針對作業在 [事件順序] 頁面上設定 [順序錯亂事件] 設定,將對此計量產生直接的影響。This metric is directly impacted by the configuration of the Out of order events setting on the Event ordering page on the job in the Azure portal.
延遲輸入事件Late Input Events 指出從來源延遲抵達的事件數目。Indicates the number of events arriving late from the source. 此計量包含已遭捨棄或已調整時間戳記的事件。This metric includes events that have been dropped or have had their timestamp was adjusted. 在 Azure 入口網站中,針對作業在 [事件順序] 頁面上設定 [延遲抵達的事件] 設定,將對此計量產生直接的影響。This metric is directly impacted by the configuration of the Events that arrive late setting in the Event ordering page on the job in the Azure portal.
早期輸入事件Early Input Events 指出從來源提早抵達,但已遭到捨棄或已調整時間戳記 (若早了 5 分鐘以上) 的事件數目。Indicates the number of events arriving early from the source that have either been dropped, or their timestamp has been adjusted if they are beyond 5 minutes early.
浮水印延遲Watermark Delay 指出串流資料處理作業的延遲。Indicates the delay of the streaming data processing job. 請參閱下一節的詳細說明。See more information in the following section.

浮水印延遲詳細資料Watermark delay details

浮水印延遲 計量的計算方式為處理節點的時鐘時間減去到目前為止發現的最大浮水印。The Watermark delay metric is computed as the wall clock time of the processing node minus the largest watermark it has seen so far. 如需詳細資訊,請參閱浮水印延遲部落格文章For more information, see the watermark delay blog post.

在正常作業下,此計量的值大於 0 的可能原因包括:There can be several reasons this metric value is larger than 0 under normal operation:

  1. 串流管線既有的處理延遲。Inherent processing delay of the streaming pipeline. 此延遲通常是額定的。Normally this delay is nominal.

  2. 順序錯亂容錯時間範圍導致了延遲,因為浮水印減去了容錯時間範圍的大小。The out-of-order tolerance window introduced delay, because watermark is reduced by the size of the tolerance window.

  3. 延遲抵達時間範圍導致了延遲,因為浮水印減去了容錯時間範圍的大小。The late arrival window introduced delay, because watermark is reduced by the size the tolerance window.

  4. 產生計量的處理節點有時鐘誤差。Clock skew of the processing node generating the metric.

此外還有許多其他資源條件約束可能導致串流管線的速度變慢。There are a number of other resource constraints that can cause the streaming pipeline to slow down. 浮水印延遲計量可能因下列狀況而產生:The watermark delay metric can rise due to:

  1. 串流分析中沒有足夠的資源可處理該數量的輸入事件。Not enough processing resources in Stream Analytics to handle the volume of input events. 若要擴大資源,請參閱了解和調整串流單位To scale up resources, see Understand and adjust Streaming Units.

  2. 輸入事件的訊息代理程式內沒有足夠的輸送量,因此受到節流控制。Not enough throughput within the input event brokers, so they are throttled. 如需可能的解決方案,請參閱自動擴大 Azure 事件中樞輸送量單位For possible solutions, see Automatically scale up Azure Event Hubs throughput units.

  3. 輸出接收端未佈建足夠的容量,因此受到節流控制。Output sinks are not provisioned with enough capacity, so they are throttled. 可能的解決方案依據所使用的輸出服務類型而有很大的不同。The possible solutions vary widely based on the flavor of output service being used.

輸出事件頻率Output event frequency

Azure 串流分析會以浮水印進度作為產生輸出事件的唯一觸發程序。Azure Stream Analytics uses watermark progress as the only trigger to produce output events. 浮水印衍生自輸入資料,因此在失敗復原期間以及使用者起始的重新處理中,都可重複使用。Because the watermark is derived from input data, it is repeatable during failure recovery and also in user initiated reprocessing. 使用時間範圍型彙總時,服務只會在時間範圍結束時產生輸出。When using windowed aggregates, the service only produces outputs at the end of the windows. 在某些情況下,使用者可能會想要查看從時間範圍產生的部分彙總。In some cases, users may want to see partial aggregates generated from the windows. Azure 串流分析目前不支援部分彙總。Partial aggregates are not supported currently in Azure Stream Analytics.

在其他串流解決方案中,輸出事件可根據外部情況在不同的觸發點具體化。In other streaming solutions, output events could be materialized at various trigger points, depending on external circumstances. 在某些解決方案中,指定時間範圍的輸出事件有可能會多次產生。It's possible in some solutions that the output events for a given time window could be generated multiple times. 隨著輸入值的調整,彙總結果會變得更精確。As the input values are refined, the aggregate results become more accurate. 事件一開始可能是推測的,其後再逐漸修訂。Events could be speculated at first, and revised over time. 例如,當一個裝置未連接網路時,系統可能會使用預估值。For example, when a certain device is offline from the network, an estimated value could be used by a system. 其後,該裝置連上了網路。Later on, the same device comes online to the network. 此時,就可將實際的事件資料包含在輸入資料流中。Then the actual event data could be included in the input stream. 處理該時間範圍的輸出結果會產生更精確的輸出。The output results from processing that time window produces more accurate output.

浮水印的說明範例Illustrated example of watermarks

以下影像說明浮水印在不同的環境中的進程。The following images illustrate how watermarks progress in different circumstances.

下表顯示後續圖表中的範例資料。This table shows the example data that is charted below. 請注意,事件時間和抵達時間不盡相同,有時相符,有時則否。Notice that the event time and the arrival time vary, sometimes matching and sometimes not.

事件時間Event time 抵達時間Arrival time deviceIdDeviceId
12:0712:07 12:0712:07 device1device1
12:0812:08 12:0812:08 device2device2
12:1712:17 12:1112:11 device1device1
12:0812:08 12:1312:13 device3device3
12:1912:19 12:1612:16 device1device1
12:1212:12 12:1712:17 device3device3
12:1712:17 12:1812:18 device2device2
12:2012:20 12:1912:19 device2device2
12:1612:16 12:2112:21 device3device3
12:2312:23 12:2212:22 device2device2
12:2212:22 12:2412:24 device2device2
12:2112:21 12:2712:27 device3device3

此圖使用下列容錯:In this illustration, the following tolerances are used:

  • 早期抵達時間範圍為 5 分鐘Early arrival windows is 5 minutes
  • 延遲抵達時間範圍為 5 分鐘Late arriving window is 5 minutes
  • 重新排序時間範圍為 2 分鐘Reorder window is 2 minutes
  1. 下圖顯示浮水印處理這些事件的進程:Illustration of watermark progressing through these events:

    Azure 串流分析浮水印圖

    上圖中值得注意的程序為:Notable processes illustrated in the preceding graphic:

    1. 第一個事件 (device1) 和第二個事件 (device2) 的時間一致,未經調整即完成處理。The first event (device1), and second event (device2) have aligned times and are processed without adjustments. 浮水印在每個事件後都有進展。The watermark progresses on each event.

    2. 處理第三個事件 (device1) 時,抵達時間 (12:11) 在事件時間 (12:17) 之前。When the third event (device1) is processed, the arrival time (12:11) precedes the event time (12:17). 此事件提早抵達了 6 分鐘,因此基於 5 分鐘的早期抵達容錯而遭到捨棄。The event arrived 6 minutes early, so the event is dropped due to the 5-minute early arrival tolerance.

      經歷此案例中的早期事件後,浮水印並沒有進展。The watermark doesn't progress in this case of an early event.

    3. 第四個事件 (device3) 和第五個事件 (device1) 的時間一致,未經調整即完成處理。The fourth event (device3), and fifth event (device1) have aligned times and are processed without adjustment. 浮水印在每個事件後都有進展。The watermark progresses on each event.

    4. 處理第六個事件 (device3) 時,抵達時間 (12:17) 和事件時間 (12:12) 低於浮水印層級。When the sixth event (device3) is processed, the arrival time (12:17) and the event time (12:12) is below the watermark level. 事件時間調整為浮水印層級 (12:17)。The event time is adjusted to the water mark level (12:17).

    5. 處理第十二個事件 (device3) 時,抵達時間 (12:27) 比事件時間 (12:21) 早了 6 分鐘。When the twelfth event (device3) is processed, the arrival time (12:27) is 6 minutes ahead of the event time (12:21). 此時適用延遲抵達原則。The late arrival policy is applied. 事件時間調整為 (12:22) 而高於浮水印 (12:21),因此無需進一步調整。The event time is adjusted (12:22), which is above the watermark (12:21) so no further adjustment is applied.

  2. 第二個圖中的浮水印進程未套用早期抵達原則:Second illustration of watermark progressing without an early arrival policy:

    Azure 串流分析無早期原則的浮水印圖

    此範例未套用早期抵達原則。In this example, no early arrival policy is applied. 提早抵達的極端值事件大幅提高了浮水印。Outlier events that arrive early raise the watermark significantly. 請注意,第三個事件 (deviceId1,時間為 12:11) 在此案例中並未遭到捨棄,而浮水印則提高至 12:15。Notice the third event (deviceId1 at time 12:11) is not dropped in this scenario, and the watermark is raised to 12:15. 第四個事件的時間因此往後調整了 7 分鐘 (12:08 到 12:15)。The fourth event time is adjusted forward 7 minutes (12:08 to 12:15) as a result.

  3. 最後一個圖中使用了子串流 (透過 DeviceId)。In the final illustration, substreams are used (OVER the DeviceId). 系統追蹤多個浮水印,每個資料流各一個。Multiple watermarks are tracked, one per stream. 因此,調整過時間的事件較少。There are fewer events with their times adjusted as a result.

    Azure 串流分析子串流浮水印圖

後續步驟Next steps