question

JonasTrumpfheller-7049 avatar image
0 Votes"
JonasTrumpfheller-7049 asked MartinJaffer-MSFT answered

How to get .zip data from the web into my DataFactory

Hello everyone,

first of all i am realativly new to the world of azure.

I think it's best if I explain to you first what exactly I'm planning or what my project is.

I'm trying to save data from a certain website, which is in a zip format, into Azure, analyze it and then visualize it in the end and make it available for other people.

The problem I have now is that I can't manage to connect to the website and Azure.
I have tried to connect to the website with the DataFactory and copy the data to a DataLake using the Data Copy module, but this did not work.

So now my question is, how do I manage to store the data from the Internet in Azure and process it afterwards?

Do I need the DataFactory at all? Am I forgetting a service completely? Will my project work at all?

I'm a bit desperate and unfortunately can't get any further without help, as Azure is simply too complicated and too extensive for me at some points.

Thanks a lot for your help!

azure-data-factoryazure-data-lake-storage
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @JonasTrumpfheller-7049 and welcome to Microsoft Q&A.

I'm sorry you are having difficulty, but without further detains into exactly what went wrong, I cannot be of much help.

When you said

but this did not work.

What exactly happened? Was there an error message? Did the read fail, or the write fail?
What source connector are you using? HTTP connector?
What sink dataset are you using? Binary, delimited text, or something else?
What compression settings are you using on source and sink datasets?

0 Votes 0 ·

Hello @MartinJaffer-MSFT,

first of all, thank you for your reply. I apologize for taking so long to reply.

But now to the questions you asked:

I tried to use Copy Data Wizrad in Azure DataFactory Studio to copy .zip data from a web page on the internet.

The problem is that there are multiple .zip files in the path I need to specify as a relative URL, and I only need one in particular.

When I confogurate the wizrad as it is in the scrennshots shown, I get a DataPreview, but not with the data I want.

Likewise, I don't know what to specify as the FolderPath.

As soon as I select a Compression Type in the DataSet settings I get an error message:
Your HttpServer source can't support random read which is requied by current copy activity settings, please create two copy activities to work around it: the first copy activity binary copy your HttpServer source to a staging store (like Azue Blob, Azure Data Lake, File, etc.), second copy activity copy from the staged file store to your destination with current settings. Activity ID: e8994e7f-d110-4d7d-83c2-e0f0c4f1b8fe

I really hope that you can help me now because I am suck right now.

Thank you!

121191-fileformatsettings.png
121192-dataset.png
121010-perview-data.png


0 Votes 0 ·
dataset.png (85.1 KiB)
perview-data.png (49.1 KiB)

Those are the oher two screenshot I made.

121182-destinationdatastore.png
121193-httpserver.png


0 Votes 0 ·

I haven't heard back from you. Are you still facing the issue?

0 Votes 0 ·

If you found your own solution, please share it here with the community.

0 Votes 0 ·

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered

Okay, I have a better understanding now.

In this case, I agree with the error message's suggestion. Breaking this down into 2 parts will give you much better control. Not everything can be done in a single step.

Instead,

HTTP (Binary) -> Blob (Binary) ... Then ... Blob( Text, compressed) -> Blob (Text, uncompressed)
Where the Blob(Binary) and Blob(Text, compressed) point to the same location.

The Copy Data wizard, only does 1 copy activity at a time I think(?). However it can be used to create each of the activities, then you can take the two and put them into the same pipeline, linked by a green on-success dependency.

Hmm, maybe it is worth either doing the compress -> uncompress in the first step , or adding as a middle step, instead of putting as last step.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.