question

Cataster-7485 avatar image
0 Votes"
Cataster-7485 asked ·

Best way to implement a sheet update?

Today, we have a process that involves the following:

  • JIRA

  • Google Sheets/tsv files

  • c# code

  • Python code

  • Manual intervention

There is a c# project hosted on bitbukcet that has Azure functions. Amongst the important functions, this one stands out ReadFrameGoogleSheetCols. (Code for it below)

Data is read from the database and then a function uploads the generated worksheet to a JIRA task (via webhooks)

worksheet123.tsv sample file:
sheet1

My colleague downloads the sheet attached in the respective JIRA task (e.g. worksheet123.tsv), uses it as input file to run python machine learning code to generate productItemID's from columns A,B, D, and E of the worksheet.

pycode

The output of this python code run is a mappings.tsv file that contains columns A, B of the input file as well as the generated ProductItemID column (as column C).

mapping.tsv output file sample:
ProdItemID

Then, she copies the ProductItemID column and pastes into the original file (e.g.worksheet123.tsv) on Google Drive. That is the manual part today.

We want to automate this process, and refine the workflow.

As you can probably spot from the images/steps, my colleague hosts all this on her machine today.

We are thinking of utilizing blob storage to host the input file, by maybe creating a webhook in the c# code to generate the file there. and then there would be some form of Azure file watcher that watches whenever theres an input file, inputs it into the python code and generate the mapping.tsv file with productItemID and then copy/paste it into the original worksheet used as input as additional column.
This means that the python code has to run on Azure of course, cause today my colleague runs it locally on her machine. I am not sure if thats possible...

We are also considering a message queue instead of file watcher.

Either way, Im looking for best advise/recommendations about streamlining our current process today. Like what services would be helpful for this on Azure? Is the filewatcher/message queueing method the right/best approaches here? Are there better alternatives? Suggestions are welcome!!


Note: I was not sure the right forum to post this, but since i think we may utilize blob storage or maybe a data lake, and there is some Azure functions involved, i tagged it under the respective tags. but there may be other better ways to do this so id appreciate some guidance on this!

ReadFrameGoogleSheetCols()

     public static Dictionary<string, List<string>> ReadFrameGoogleSheetCols(string fileId, string sheetCol, int startcol = 1)
     {
         var dc = new Dictionary<string, List<string>>();
         var sheetname = Helper.GetSetting("GoogleSheetName");

         var cols = Helper.GetSetting("SheetSPcolumn_Mapping") + "," + sheetCol;

         if (!string.IsNullOrEmpty(sheetCol))
         {
             var col1 = cols.Split(',')[0];
             cols = $"{col1},{sheetCol}";
         }

         String range = $"{sheetname}!A{startcol}:ZZ2";

         GoogleAPIService api = new GoogleAPIService();
         SpreadsheetsResource.ValuesResource.GetRequest request =
                api.GoogleSheetService().Spreadsheets.Values.Get(fileId, range);
         ValueRange response = request.Execute();
         IList<IList<Object>> values = response.Values;
         if (values != null && values.Count > 0)
         {
             var iWarning = 0;
             var iMaxCol = 0;
             for (var i = 0; i < values[0].Count; i++)
             {
                 if (values[0][i].ToString().ToLower().Equals(cols.Split(',')[0].ToLower()))
                 {
                     iWarning = i;
                     break;
                 }
             }
                
             List<int> lCols = new List<int>();

             for (var i = iWarning + 1; i < values[0].Count; i++)
             {
                 if (cols.ToLower().Contains(values[0][i].ToString().ToLower()))
                 {
                     iMaxCol = i;
                     lCols.Add(i - iWarning);
                 }
             }

             var startRow = startcol + 1; // row where to start read data

             range = $"{sheetname}!{api.GetColumnName(iWarning)}{startRow}:{api.GetColumnName(iMaxCol)}";
             SpreadsheetsResource.ValuesResource.GetRequest reqSP =
               api.GoogleSheetService().Spreadsheets.Values.Get(fileId, range);

             ValueRange resSP = reqSP.Execute();
             IList<IList<Object>> valsp = resSP.Values;


             var lstwarnings = new List<string>();
             lstwarnings.AddRange(from row in valsp
                                  where !row[0].ToString().ToUpper().StartsWith("OK")
                                  select row[0].ToString());

             // return null if All warnings is not OK
             if (lstwarnings.Count > 0)
                 return null;


             if (valsp != null && valsp.Count > 0)
             {
                 var colSP = sheetCol.Split(',');
                 var icol = 0;

                 foreach (var (i, lstsp) in from i in lCols
                                            let lstsp = new List<string>()
                                            select (i, lstsp))
                 {
                     lstsp.AddRange(from row in valsp
                                    where row.Count > i && row[0].ToString().ToUpper().StartsWith("OK") && !row[i].ToString().StartsWith("--")
                                    select row[i].ToString());
                     dc.Add(colSP[icol], lstsp);
                     icol++;
                 }

                 if (!ValidateHeaders(colSP, dc))
                     return null;
             }

         }

         return dc;
     }


dotnet-csharpazure-functionsazure-blob-storageazure-data-lake-storage
image.png (35.3 KiB)
image.png (53.0 KiB)
image.png (36.5 KiB)
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

PramodValavala-MSFT avatar image
0 Votes"
PramodValavala-MSFT answered ·

Azure Functions would be the ideal choice for all things related to code (like the python part) in your scenario.

For the sheets, you could even leverage Logic Apps by using Excel and/or Google Sheets connectors, reducing the amount of code that you need to manage, assuming the connectors satisfy all your requirements.

And instead of a file watcher, you could leverage Blob Storage Events (powered by Event Grid) directly without polling and/or implementing your own queue.


· 10 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@PramodValavala-MSFT
Awesome thanks for the suggestions! I have never heard of these services so I truly appreciate your input!
Btw I read through the documentation and it looks like theres minimal need for code, but then how would we extract/copy the ProductItemID column generated from python model output and paste/insert it into the original worksheet used as input?

Also, can you provide implementation steps/suggestions? I'm trying to craft an architectural diagram of this proposal and any details would be appreciated! Thanks!

0 Votes 0 ·

@Cataster-7485 The python function could create the file into blob storage using the output binding, which via the blob storage events would trigger a logic app that would parse and extract the required values, updating the Google Sheet using the respective connector.

If you python function could output JSON, you could directly use the Parse JSON, otherwise you would have to use the Inline Code or an Azure Function for the same. While possible, this would be complex in Logic App directly I suppose.


0 Votes 0 ·
Cataster-7485 avatar image Cataster-7485 PramodValavala-MSFT ·

@PramodValavala-MSFT Thank you very much Pramod! Do you know if its possible to run .ipynb code file in this architecture/logic apps/inline code? or does it only accept .py file extensions?
My colleague currently utilizes Jupyter to run this python script, but i realized the extension is .ipynb and she was surprised too, and she wasnt sure if thats because shes running it in Jupyter. She did express desire to use Jypyter instead of Visual studio code, do you know if logic apps or this architecture discussed would be compatible/support jupyter? or does she have to adapt to visual studio code?

0 Votes 0 ·
Show more comments
Cataster-7485 avatar image Cataster-7485 PramodValavala-MSFT ·

@PramodValavala-MSFT In addition to my previous comment, I was looking at Azure Blob Storage Event model and i came across Azure Automation. Do you think this service would be any useful in this scenario? maybe instead of logic apps? or do you feel logic apps works a lot better with this scenario we have here? why/why not?

0 Votes 0 ·
Show more comments