Extract files from .tar.gz files store in Blob Container

elastiSol 1 Reputation point
2020-09-11T17:23:46.507+00:00

Is it possible to extract files from .tar.gz using Azure Data Factory or Functions App to be ingested by ETL process in ADF?

I tried to use 7zip in Functions Apps, which worked fine to extract a test .tar.gz file uploaded to the Functions App but throws an for files stored in Blob container.

Here's my command in run.ps1 in Functions

Set-Location D:\home\site\wwwroot\tools

The tar file is stored locally and it works

.\7za.exe x 1.tar.gz

below, the file is a blob in Blob Container - doesn't work

.\7za.exe x $InputBlob

Below is the error I receive for the command above

2020-09-11T16:06:42.721 [Error] ERROR: Program '7za.exe' failed to run: StandardOutputEncoding is only supported when standard output is redirected.At D:\home\site\wwwroot\tools\run.ps1:9 char:1+ .\7za.exe e $InputBlob+ ~~~~~~~~~~~~~~~~~~~~~~.Exception :Type : System.Management.Automation.ApplicationFailedExceptionErrorRecord :Exception :Type : System.Management.Automation.ParentContainsErrorRecordExceptionMessage : Program '7za.exe' failed to run: StandardOutputEncoding is only supported when standard output is redirected.At D:\home\site\wwwroot\tools\run.ps1:9 char:1+ .\7za.exe e $InputBlob+ ~~~~~~~~~~~~~~~~~~~~~~.HResult : -2146233087CategoryInfo : ResourceUnavailable: (:) [], ParentContainsErrorRecordExceptionFullyQualifiedErrorId : NativeCommandFailedInvocationInfo :ScriptLineNumber : 9OffsetInLine : 1HistoryId : -1ScriptName : D:\home\site\wwwroot\tools\run.ps1Line : .\7za.exe e $InputBlobPositionMessage : At D:\home\site\wwwroot\tools\run.ps1:9 char:1+ .\7za.exe e $InputBlob+ ~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : D:\home\site\wwwroot\toolsPSCommandPath

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,262 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,534 questions
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2020-09-13T19:04:51.027+00:00

    Hi @elastiSol ,

    Welcome to Microsoft Q&A forum and thanks for your query.

    Since this query has two areas of Azure (ADF, Azure Function), I would like provide my inputs on ADF part.

    If your data format is other than AvroFormat, OrcFormat, or ParquetFormat then you can try the compression settings in your source dataset connection settings (GZIP or TarGZIP) as shown below to decompress files.

    24277-image.png

    Ref: ADF Compression Support

    24311-image.png

    Unfortunately there is no out-of-box functionality in ADF to extract contents from TAR file. Here is an existing user voice feature request thread, I would encourage you to please up-vote and/or comment on the feature request suggestion to increase the priority of feature implementation.

    https://feedback.azure.com/forums/270578-data-factory/suggestions/34575520-support-extracting-contents-from-tar-file

    But as a workaround you could try using the extensibility features of Azure Data Factory to transform files that aren't supported. Two options include Azure Functions and custom tasks by using Azure Batch (Custom Activity in ADF).

    You can see a sample that uses an Azure function to extract the contents of a tar file: Untar Azure File With Azure Function Sample

    Going back to the second part of the ask, i.e., error from Azure function apps, let me reach out to Integration folks to better assist on this.

    Hope the above info helps. Will get back to you once I have an update from the internal team regarding the error from function apps.

    Thank you

    0 comments No comments

  2. Mike Urnun 9,666 Reputation points Microsoft Employee
    2020-09-17T02:34:35.577+00:00

    Hello @elastiSol

    You can certainly use Azure Functions for this task. Here's a tutorial video where a .zip file on one blob container is being unzipped on another blob container: https://www.youtube.com/watch?v=GRztpy337kU

    Github repo: https://github.com/FBoucher/AzUnzipEverything

    Although the above is a demonstration with zip files, you can certainly use a similar setup for 7zip or tar.gz by implementing their respective SDKs in your Functions code. Hope this helps :)

    0 comments No comments