Analytics and visualization samples for Microsoft Academic Graph
Illustrates how to perform analytics and visualization for Microsoft Academic Graph using Data Lake Analytics (U-SQL) and Power BI.
Complete these tasks before beginning this tutorial:
- Set up provisioning of Microsoft Academic Graph to an Azure blob storage account
- Set up an Azure Data Lake Analytics for Microsoft Academic Graph
- Microsoft Power BI Desktop client
- Visual Studio 2017 or Visual Studio 2015 with Data Lake tools
Gather the information that you need
Before you begin, you should have these items of information:
✔️ The name of your Azure Storage (AS) account containing MAG dataset from Get Microsoft Academic Graph on Azure storage.
✔️ The name of your Azure Data Lake Analytics (ADLA) service from Set up Azure Data Lake Analytics.
✔️ The name of your Azure Data Lake Storage (ADLS) from Set up Azure Data Lake Analytics.
✔️ The name of the container in your Azure Storage (AS) account containing MAG dataset.
Define functions to extract MAG data
In prerequisite Set up an Azure Data Lake Analytics, you added the Azure Storage (AS) created for MAG provision as a data source for the Azure Data Lake Analytics service (ADLA). In this section, you submit an ADLA job to create functions extracting MAG data from Azure Storage (AS).
Follow instructions in Define MAG functions.
- Field of Study Top Authors
- Field of Study Entity Counts
- Field of Study Top Entities
- Conference Top Authors By Static Rank
- Conference Paper Statistics
- Conference Top Papers
- Conference Top Authors
- Conference Top Institutions
- Conference Memory of References
- Conference Top Referenced Venues
- Conference Top Citing Venues
- Organization Insight
Getting started with sample projects
Download or clone the samples repository
Open the solution /src/AcademicAnalytics.sln
For each tutorial there should be: A U-SQL script(.usql), a Power BI report (.pbix), a Power BI template (.pbit) and a README explaining the tutorial.
In the U-SQL script, replace
<MagContainer>placeholder values with the values that you collected while completing the prerequisites of this sample.
The name of your Azure Storage (AS) account containing MAG dataset.
The container name in Azure Storage (AS) account containing MAG dataset, usually in the form of mag-yyyy-mm-dd.
Although each tutorial is different, running the U-SQL script as is and filling out the Power BI template using the same U-SQL parameters should give you a Power BI report with visualizations that match the Power BI report example included in the tutorial. Since the Microsoft Academic graph is contently improving, different graph versions may give you slightly different results.
Working with U-SQL scripts
How to run U-SQL scripts
Make sure you have selected your Data Lake account
Build the script first to validate syntax
Submit your script to your Data Lake account
How to view U-SQL results in Azure portal
Using Power BI
Make sure U-SQL script finished successfully
Open up corresponding Power BI Template (.pbit) from file explorer (Visual studio doesn't recognize Power BI files)
Enter your ADL information and parameters corresponding to your scripts
Make sure the parameters cases are the same as your script and "click" to load
- Get started with Azure Data Lake Analytics using Azure portal
- Develop U-SQL scripts by using Data Lake Tools for Visual Studio
- Get started with U-SQL
- Deep Dive into Query Parameters and Power BI Templates
- Manage Azure Data Lake Store resources by using Storage Explorer
- Scalable Data Science with Azure Data Lake: An end-to-end walk-through
- Microsoft Academic Website