Exploring Big Data: Course 8 – Orchestrating Big Data with Azure Data Factory

Big Data with Sam Lester

(This is course #8 of my review of the Microsoft Professional Program in Big Data)

Course #8 of 10 – Orchestrating Big Data with Azure Data Factory

Overview: The “Orchestrating Big Data with Azure Data Factory” course is comprised of four graded sections of videos and labs, covering Azure Data Factory, creating and monitoring pipelines, and performing transformations in U-SQL and Hive. Each section has a step-by-step lab exercise that walks you through the creation of the content covered in the videos followed by a quiz for each section. In addition, there is a final challenge where you perform similar steps as in the labs and answer the final exam questions based on the output.

Time Spent / Level of Effort: As I do with each Edx course, I opened the homework quiz and the final exam in new windows to be able to watch for the content as it appeared in the videos. After going through the videos for the first main section on Azure Data Factory, I was able to successfully answer the review questions. I continued with all the videos and completed the course (graded sections) entirely by only watching the videos in roughly 2 hours. After finishing the graded sections, I then moved to the final challenge. Since I had not completed the step-by-step labs, the final challenge took longer than it likely would have had I done the labs, but I was still able to complete it without much hassle. Overall, the course took roughly 3.5-4 hours and I completed it in one night with a final grade of 100%.

Course Highlight: The course was rather short but packed with great content around Azure Data Factory and pipelines. After now completing eight courses of the Big Data MPP, the highlight for me was in starting to recognize the big picture of how each of these technologies fits together. This course leveraged the work we’ve done in several previous courses to create Azure components such as storage accounts and Data Lake Analytics, as well as SQL Database and TSQL querying, and U-SQL transformations. I’m looking forward to the capstone course where we’ll get a chance to put this all together on a challenging project.

Suggestions: The videos alone were sufficient to easily pass the course with the needed 70%. I watched them all at double speed with the quizzes and exam open in other tabs to watch for the test content. For the final project, I completed the project in Azure and verified my answers locally by using Power BI to open the lab files and perform the calculations. For example, one of the final challenge questions asks for the total amount of an individual order. In Azure, we completed this by using a U-SQL transformation and a pipeline to process the data. To get the answer, we then query the SQL Database for the total. To confirm my answer, I opened the text file in Power BI desktop and created a calculation for this order number. I used this verification trick for two of the three final challenge questions.   

If you have taken this course in the past or are going through it now, please leave a comment and share your experience.

Sam Lester (MSFT)