Analyze Website logs using Azure Data Lake Analytics
Learn how to analyze website logs using Data Lake Analytics, especially on finding out which referrers ran into errors when they tried to visit the website.
Visual Studio 2015 or Visual Studio 2013.
Once Data Lake Tools for Visual Studio is installed, you will see a Data Lake item in the Tools menu in Visual Studio:
Basic knowledge of Data Lake Analytics and the Data Lake Tools for Visual Studio. To get started, see:
A Data Lake Analytics account. See Create an Azure Data Lake Analytics account.
Install the sample data. In the Azure Portal, open you Data Lake Analytics account and click Sample Scripts on the left menu, then click Copy Sample Data.
Connect to Azure
Before you can build and test any U-SQL scripts, you must first connect to Azure.
To connect to Data Lake Analytics
- Open Visual Studio.
- Click Data Lake > Options and Settings.
- Click Sign In, or Change User if someone has signed in, and follow the instructions.
- Click OK to close the Options and Settings dialog.
To browse your Data Lake Analytics accounts
- From Visual Studio, open Server Explorer by press CTRL+ALT+S.
- From Server Explorer, expand Azure, and then expand Data Lake Analytics. You shall see a list of your Data Lake Analytics accounts if there are any. You cannot create Data Lake Analytics accounts from the studio. To create an account, see Get Started with Azure Data Lake Analytics using Azure Portal or Get Started with Azure Data Lake Analytics using Azure PowerShell.
Develop U-SQL application
A U-SQL application is mostly a U-SQL script. To learn more about U-SQL, see Get started with U-SQL.
You can add addition user-defined operators to the application. For more information, see Develop U-SQL user defined operators for Data Lake Analytics jobs.
To create and submit a Data Lake Analytics job
Click the File > New > Project.
Select the U-SQL Project type.
Click OK. Visual studio creates a solution with a Script.usql file.
Enter the following script into the Script.usql file:
// Create a database for easy reuse, so you don't need to read from a file every time. CREATE DATABASE IF NOT EXISTS SampleDBTutorials; // Create a Table valued function. TVF ensures that your jobs fetch data from the weblog file with the correct schema. DROP FUNCTION IF EXISTS SampleDBTutorials.dbo.WeblogsView; CREATE FUNCTION SampleDBTutorials.dbo.WeblogsView() RETURNS @result TABLE ( s_date DateTime, s_time string, s_sitename string, cs_method string, cs_uristem string, cs_uriquery string, s_port int, cs_username string, c_ip string, cs_useragent string, cs_cookie string, cs_referer string, cs_host string, sc_status int, sc_substatus int, sc_win32status int, sc_bytes int, cs_bytes int, s_timetaken int ) AS BEGIN @result = EXTRACT s_date DateTime, s_time string, s_sitename string, cs_method string, cs_uristem string, cs_uriquery string, s_port int, cs_username string, c_ip string, cs_useragent string, cs_cookie string, cs_referer string, cs_host string, sc_status int, sc_substatus int, sc_win32status int, sc_bytes int, cs_bytes int, s_timetaken int FROM @"/Samples/Data/WebLog.log" USING Extractors.Text(delimiter:' '); RETURN; END; // Create a table for storing referrers and status DROP TABLE IF EXISTS SampleDBTutorials.dbo.ReferrersPerDay; @weblog = SampleDBTutorials.dbo.WeblogsView(); CREATE TABLE SampleDBTutorials.dbo.ReferrersPerDay ( INDEX idx1 CLUSTERED(Year ASC) DISTRIBUTED BY HASH(Year) ) AS SELECT s_date.Year AS Year, s_date.Month AS Month, s_date.Day AS Day, cs_referer, sc_status, COUNT(DISTINCT c_ip) AS cnt FROM @weblog GROUP BY s_date, cs_referer, sc_status;
To understand the U-SQL, see Get started with Data Lake Analytics U-SQL language.
Add a new U-SQL script to your project and enter the following:
// Query the referrers that ran into errors @content = SELECT * FROM SampleDBTutorials.dbo.ReferrersPerDay WHERE sc_status >=400 AND sc_status < 500; OUTPUT @content TO @"/Samples/Outputs/UnsuccessfulResponses.log" USING Outputters.Tsv();
Switch back to the first U-SQL script and next to the Submit button, specify your Analytics account.
From Solution Explorer, right click Script.usql, and then click Build Script. Verify the results in the Output pane.
From Solution Explorer, right click Script.usql, and then click Submit Script.
Verify the Analytics Account is the one where you want to run the job, and then click Submit. Submission results and job link are available in the Data Lake Tools for Visual Studio Results window when the submission is completed.
Wait until the job is completed successfully. If the job failed, it is most likely missing the source file. Please see the Prerequisite section of this tutorial. For additional troubleshooting information, see Monitor and troubleshoot Azure Data Lake Analytics jobs.
When the job is completed, you shall see the following screen:
Now repeat steps 7- 10 for Script1.usql.
To see the job output
- From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts, right-click the default Data Lake Storage account, and then click Explorer.
- Double-click Samples to open the folder, and then double-click Outputs.
- Double-click UnsuccessfulResponses.log.
- You can also double-click the output file inside the graph view of the job in order to navigate directly to the output.
To get started with Data Lake Analytics using different tools, see: