question

holomew avatar image
0 Votes"
holomew asked MartinJaffer-MSFT commented

ADF or SQL: Find Peaks Of Data By Values And Sum Over Time Period

Looking for an ADF solution or a SQL query/SP (or maybe even Databricks if necessary): I'm looking to find the sum of peak(s) of given values by type over a time period. Furthermore, if the peak doesn't have a downward trend during the given time period, it should not be counted.

The data, for example:


Type Amount Date
AAA 10 2021-07-11
AAA 15 2021-07-11
AAA 20 2021-07-11
AAA 25 2021-07-11 *Need this peak
AAA 20 2021-07-11
AAA 15 2021-07-11
AAA 10 2021-07-11
AAA 15 2021-07-11
AAA 20 2021-07-11
AAA 25 2021-07-11
AAA 30 2021-07-11 *Need this peak added to previous peak
AAA 25 2021-07-11
AAA 20 2021-07-11
AAA 15 2021-07-11
AAA 10 2021-07-11
BBB 10 2021-07-11
BBB 20 2021-07-11
BBB 30 2021-07-11 *Need this peak
BBB 20 2021-07-11
BBB 10 2021-07-11
CCC 10 2021-07-11
CCC 20 2021-07-11
CCC 30 2021-07-11
CCC 20 2021-07-11
CCC 10 2021-07-11
CCC 20 2021-07-11
CCC 30 2021-07-11
CCC 40 2021-07-11 *This peak won't count because the downtrend happens the next day
CCC 30 2021-07-12
CCC 20 2021-07-12
CCC 10 2021-07-12

The results for given day 2021-07-11 should be:

Type Total
AAA 55
BBB 30
(CCC not included since there was no downtrend for the given date)

I'm not even really sure where to start except for maybe getting the value of each row that has a following row that is less (starts downtrending).

azure-data-factory
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @holomew and welcome to Microsoft Q&A. Thank you for bringing an interesting question.

Azure Data Factory is not the right tool for this job. It may be possible to do in ADF, but it would be awkward. Mapping Data Flows could do it because of the map and collect functions.

This question looks like it would do well in StackOverflow.

I notice the dates do not have times, just dates, yet a trend requires order. Are the rows ordered by type, then by date/time?

Let us break this down into subproblems. You want to apply the same process to each "type" (AAA/BBB/CCC). You also want to apply the process to each date independently, sounds like.

A local maxima (peak) can be defined a an element in a series, where the previous element is less than the current element, and the next element is less than the current element.

f(N-1) < f(N) > f(N+1)

There are two edge cases, the first of the day, and last of the day. The first and last elements. You demonstrate them both in the case of CCC. They are not included as peaks (even if that does seem weird from a business logic point of view). To handle this, we simply don't run the function on the first and last elements..

0 Votes 0 ·

@holomew are you still facing the issue? If my comments did not help, please tell me how I may better assist.

0 Votes 0 ·

If you found your own solution, please share here with the community.

0 Votes 0 ·

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered

If the data is always ordered like shown,

 do_compare( array , index )
   if (array[index].type == array[index-1].type) && (array[index].type == array[index+1].type)
     if (array[index].date == array[index-1].date) && (array[index].date == array[index+1].date)
       if (array[index].value > array[index-1].value) && (array[index].value > array[index+1].value)
         then increment_peak_tuple_dict( array[index].type, array[index].date, array[index].value )
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.