Analyze Website logs using Azure Data Lake Analytics

Learn how to analyze website logs using Data Lake Analytics, especially on finding out which referrers ran into errors when they tried to visit the website.

Prerequisites

Connect to Azure

Before you can build and test any U-SQL scripts, you must first connect to Azure.

To connect to Data Lake Analytics

  1. Open Visual Studio.
  2. Click Data Lake > Options and Settings.
  3. Click Sign In, or Change User if someone has signed in, and follow the instructions.
  4. Click OK to close the Options and Settings dialog.

To browse your Data Lake Analytics accounts

  1. From Visual Studio, open Server Explorer by press CTRL+ALT+S.
  2. From Server Explorer, expand Azure, and then expand Data Lake Analytics. You shall see a list of your Data Lake Analytics accounts if there are any. You cannot create Data Lake Analytics accounts from the studio. To create an account, see Get Started with Azure Data Lake Analytics using Azure Portal or Get Started with Azure Data Lake Analytics using Azure PowerShell.

Develop U-SQL application

A U-SQL application is mostly a U-SQL script. To learn more about U-SQL, see Get started with U-SQL.

You can add addition user-defined operators to the application. For more information, see Develop U-SQL user defined operators for Data Lake Analytics jobs.

To create and submit a Data Lake Analytics job

  1. Click the File > New > Project.
  2. Select the U-SQL Project type.

    new U-SQL Visual Studio project

  3. Click OK. Visual studio creates a solution with a Script.usql file.
  4. Enter the following script into the Script.usql file:

     // Create a database for easy reuse, so you don't need to read from a file every time.
     CREATE DATABASE IF NOT EXISTS SampleDBTutorials;
    
     // Create a Table valued function. TVF ensures that your jobs fetch data from the weblog file with the correct schema.
     DROP FUNCTION IF EXISTS SampleDBTutorials.dbo.WeblogsView;
     CREATE FUNCTION SampleDBTutorials.dbo.WeblogsView()
     RETURNS @result TABLE
     (
         s_date DateTime,
         s_time string,
         s_sitename string,
         cs_method string,
         cs_uristem string,
         cs_uriquery string,
         s_port int,
         cs_username string,
         c_ip string,
         cs_useragent string,
         cs_cookie string,
         cs_referer string,
         cs_host string,
         sc_status int,
         sc_substatus int,
         sc_win32status int,
         sc_bytes int,
         cs_bytes int,
         s_timetaken int
     )
     AS
     BEGIN
    
         @result = EXTRACT
             s_date DateTime,
             s_time string,
             s_sitename string,
             cs_method string,
             cs_uristem string,
             cs_uriquery string,
             s_port int,
             cs_username string,
             c_ip string,
             cs_useragent string,
             cs_cookie string,
             cs_referer string,
             cs_host string,
             sc_status int,
             sc_substatus int,
             sc_win32status int,
             sc_bytes int,
             cs_bytes int,
             s_timetaken int
         FROM @"/Samples/Data/WebLog.log"
         USING Extractors.Text(delimiter:' ');
         RETURN;
     END;
    
     // Create a table for storing referrers and status
     DROP TABLE IF EXISTS SampleDBTutorials.dbo.ReferrersPerDay;
     @weblog = SampleDBTutorials.dbo.WeblogsView();
     CREATE TABLE SampleDBTutorials.dbo.ReferrersPerDay
     (
         INDEX idx1
         CLUSTERED(Year ASC)
         DISTRIBUTED BY HASH(Year)
     ) AS
    
     SELECT s_date.Year AS Year,
         s_date.Month AS Month,
         s_date.Day AS Day,
         cs_referer,
         sc_status,
         COUNT(DISTINCT c_ip) AS cnt
     FROM @weblog
     GROUP BY s_date,
             cs_referer,
             sc_status;
    

    To understand the U-SQL, see Get started with Data Lake Analytics U-SQL language.

  5. Add a new U-SQL script to your project and enter the following:

     // Query the referrers that ran into errors
     @content =
         SELECT *
         FROM SampleDBTutorials.dbo.ReferrersPerDay
         WHERE sc_status >=400 AND sc_status < 500;
    
     OUTPUT @content
     TO @"/Samples/Outputs/UnsuccessfulResponses.log"
     USING Outputters.Tsv();
    
  6. Switch back to the first U-SQL script and next to the Submit button, specify your Analytics account.
  7. From Solution Explorer, right click Script.usql, and then click Build Script. Verify the results in the Output pane.
  8. From Solution Explorer, right click Script.usql, and then click Submit Script.
  9. Verify the Analytics Account is the one where you want to run the job, and then click Submit. Submission results and job link are available in the Data Lake Tools for Visual Studio Results window when the submission is completed.
  10. Wait until the job is completed successfully. If the job failed, it is most likely missing the source file. Please see the Prerequisite section of this tutorial. For additional troubleshooting information, see Monitor and troubleshoot Azure Data Lake Analytics jobs.

    When the job is completed, you shall see the following screen:

    data lake analytics analyze weblogs website logs

  11. Now repeat steps 7- 10 for Script1.usql.

To see the job output

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts, right-click the default Data Lake Storage account, and then click Explorer.
  2. Double-click Samples to open the folder, and then double-click Outputs.
  3. Double-click UnsuccessfulResponsees.log.
  4. You can also double-click the output file inside the graph view of the job in order to navigate directly to the output.

See also

To get started with Data Lake Analytics using different tools, see: