Get started with Azure Data Catalog
Azure Data Catalog is a fully managed cloud service that serves as a system of registration and system of discovery for enterprise data assets. For a detailed overview, see What is Azure Data Catalog.
This tutorial helps you get started with Azure Data Catalog. You perform the following procedures in this tutorial:
|Provision data catalog||In this procedure, you provision or set up Azure Data Catalog. You do this step only if the catalog has not been set up before. You can have only one data catalog per organization (Microsoft Azure Active Directory domain) even though there are multiple subscriptions associated with your Azure account.|
|Register data assets||In this procedure, you register data assets from the AdventureWorks2014 sample database with the data catalog. Registration is the process of extracting key structural metadata such as names, types, and locations from the data source and copying that metadata to the catalog. The data source and data assets remain where they are, but the metadata is used by the catalog to make them more easily discoverable and understandable.|
|Discover data assets||In this procedure, you use the Azure Data Catalog portal to discover data assets that were registered in the previous step. After a data source has been registered with Azure Data Catalog, its metadata is indexed by the service so that users can easily search for the data they need.|
|Annotate data assets||In this procedure, you provide annotations (information such as descriptions, tags, documentation, or experts) for the data assets. This information supplements the metadata extracted from the data source, and to make the data source more understandable to more people.|
|Connect to data assets||In this procedure, you open data assets in integrated client tools (such as Excel and SQL Server Data Tools) and a non-integrated tool (SQL Server Management Studio).|
|Manage data assets||In this procedure, you set up security for your data assets. Data Catalog does not give users access to the data itself. The owner of the data source controls data access.
With Data Catalog, you can discover data sources and view the metadata related to the sources registered in the catalog. There may be situations, however, where data sources should be visible only to specific users or to members of specific groups. For these scenarios, you can use Data Catalog to take ownership of registered data assets within the catalog and control the visibility of the assets you own.
|Remove data assets||In this procedure, you learn how to remove data assets from the data catalog.|
To set up Azure Data Catalog, you must be the owner or co-owner of an Azure subscription.
Azure subscriptions help you organize access to cloud service resources like Azure Data Catalog. They also help you control how resource usage is reported, billed, and paid for. Each subscription can have a different billing and payment setup, so you can have different subscriptions and different plans by department, project, regional office, and so on. Every cloud service belongs to a subscription, and you need to have a subscription before setting up Azure Data Catalog. To learn more, see Manage accounts, subscriptions, and administrative roles.
If you don't have a subscription, you can create a free trial account in just a couple of minutes. See Free Trial for details.
Azure Active Directory
To set up Azure Data Catalog, you must be signed in with an Azure Active Directory (Azure AD) user account. You must be the owner or co-owner of an Azure subscription.
Azure AD provides an easy way for your business to manage identity and access, both in the cloud and on-premises. You can use a single work or school account to sign in to any cloud or on-premises web application. Azure Data Catalog uses Azure AD to authenticate sign-in. To learn more, see What is Azure Active Directory.
Azure Active Directory policy configuration
You may encounter a situation where you can sign in to the Azure Data Catalog portal, but when you attempt to sign in to the data source registration tool, you encounter an error message that prevents you from signing in. This error may occur when you are on the company network or when you are connecting from outside the company network.
The registration tool uses forms authentication to validate user sign-ins against Azure Active Directory. For successful sign-in, an Azure Active Directory administrator must enable forms authentication in the global authentication policy.
With the global authentication policy, you can enable authentication separately for intranet and extranet connections, as shown in the following image. Sign-in errors may occur if forms authentication is not enabled for the network from which you're connecting.
For more information, see Configuring authentication policies.
Provision data catalog
You can provision only one data catalog per organization (Azure Active Directory domain). Therefore, if the owner or co-owner of an Azure subscription who belongs to this Azure Active Directory domain has already created a catalog, you will not be able to create a catalog again even if you have multiple Azure subscriptions. To test whether a data catalog has been created by a user in your Azure Active Directory domain, go to the Azure Data Catalog home page and verify whether you see the catalog. If a catalog has already been created for you, skip the following procedure and go to the next section.
Go to the Data Catalog service page and click Get started.
Sign in with a user account that is the owner or co-owner of an Azure subscription. You see the following page after signing in.
Specify a name for the data catalog, the subscription you want to use, and the location for the catalog.
Expand Pricing and select an Azure Data Catalog edition (Free or Standard).
Expand Catalog Users and click Add to add users for the data catalog. You are automatically added to this group.
Expand Catalog Administrators and click Add to add additional administrators for the data catalog. You are automatically added to this group.
Click Create Catalog to create the data catalog for your organization. You see the home page for the data catalog after it is created.
Find a data catalog in the Azure portal
On a separate tab in the web browser or in a separate web browser window, go to the Azure portal and sign in with the same account that you used to create the data catalog in the previous step.
Select Browse and then click Data Catalog.
You see the data catalog you created.
Click the catalog that you created. You see the Data Catalog blade in the portal.
You can view properties of the data catalog and update them. For example, click Pricing tier and change the edition.
Adventure Works sample database
In this tutorial, you register data assets (tables) from the AdventureWorks2014 sample database for the SQL Server Database Engine, but you can use any supported data source if you would prefer to work with data that is familiar and relevant to your role. For a list of supported data sources, see Supported data sources.
Install the Adventure Works 2014 OLTP database
The Adventure Works database supports standard online transaction-processing scenarios for a fictitious bicycle manufacturer (Adventure Works Cycles), which includes products, sales, and purchasing. In this tutorial, you register information about products into Azure Data Catalog.
To install the Adventure Works sample database:
- Download Adventure Works 2014 Full Database Backup.zip on CodePlex.
- To restore the database on your machine, follow the instructions in Restore a Database Backup by using SQL Server Management Studio, or by following these steps:
- Open SQL Server Management Studio and connect to the SQL Server Database Engine.
- Right-click Databases and click Restore Database.
- Under Restore Database, click the Device option for Source and click Browse.
- Under Select backup devices, click Add.
- Go to the folder where you have the AdventureWorks2014.bak file, select the file, and click OK to close the Locate Backup File dialog box.
- Click OK to close the Select backup devices dialog box.
- Click OK to close the Restore Database dialog box.
You can now register data assets from the Adventure Works sample database by using Azure Data Catalog.
Register data assets
In this exercise, you use the registration tool to register data assets from the Adventure Works database with the catalog. Registration is the process of extracting key structural metadata such as names, types, and locations from the data source and the assets it contains, and copying that metadata to the catalog. The data source and data assets remain where they are, but the metadata is used by the catalog to make them more easily discoverable and understandable.
Register a data source
Go to the Azure Data Catalog home page and click Publish Data.
Click Launch Application to download, install, and run the registration tool on your computer.
On the Welcome page, click Sign in and enter your credentials.
On the Microsoft Azure Data Catalog page, click SQL Server and Next.
Enter the SQL Server connection properties for AdventureWorks2014 (see the following example) and click CONNECT.
Register the metadata of your data asset. In this example, you register Production/Product objects from the AdventureWorks Production namespace:
In the Server Hierarchy tree, expand AdventureWorks2014 and click Production.
Select Product, ProductCategory, ProductDescription, and ProductPhoto by using Ctrl+click.
Click the move selected arrow (>). This action moves all selected objects into the Objects to be registered list.
Select Include a Preview to include a snapshot preview of the data. The snapshot includes up to 20 records from each table, and it is copied into the catalog.
Select Include Data Profile to include a snapshot of the object statistics for the data profile (for example: minimum, maximum, and average values for a column, number of rows).
In the Add tags field, enter adventure works, cycles. This action adds search tags for these data assets. Tags are a great way to help users find a registered data source.
Specify the name of an expert on this data (optional).
Click REGISTER. Azure Data Catalog registers your selected objects. In this exercise, the selected objects from Adventure Works are registered. The registration tool extracts metadata from the data asset and copies that data into the Azure Data Catalog service. The data remains where it currently resides, and it remains under the control of the administrators and policies of the current system.
To see your registered data source objects, click View Portal. In the Azure Data Catalog portal, confirm that you see all four tables and the database in the grid view.
In this exercise, you registered objects from the Adventure Works sample database so that they can be easily discovered by users across your organization. In the next exercise, you learn how to discover registered data assets.
Discover data assets
Discovery in Azure Data Catalog uses two primary mechanisms: searching and filtering.
Searching is designed to be both intuitive and powerful. By default, search terms are matched against any property in the catalog, including user-provided annotations.
Filtering is designed to complement searching. You can select specific characteristics such as experts, data source type, object type, and tags to view matching data assets and to constrain search results to matching assets.
By using a combination of searching and filtering, you can quickly navigate the data sources that have been registered with Azure Data Catalog to discover the data assets you need.
In this exercise, you use the Azure Data Catalog portal to discover data assets you registered in the previous exercise. See Data Catalog Search syntax reference for details about search syntax.
Following are a few examples for discovering data assets in the catalog.
Discover data assets with basic search
Basic search helps you search a catalog by using one or more search terms. Results are any assets that match on any property with one or more of the terms specified.
Click Home in the Azure Data Catalog portal. If you have closed the web browser, go to the Azure Data Catalog home page.
In the search box, enter
cyclesand press ENTER.
Confirm that you see all four tables and the database (AdventureWorks2014) in the results. You can switch between grid view and list view by clicking buttons on the toolbar as shown in the following image. Notice that the search keyword is highlighted in the search results because the Highlight option is ON. You can also specify the number of results per page in search results.
The Searches panel is on the left and the Properties panel is on the right. On the Searches panel, you can change search criteria and filter results. The Properties panel displays properties of a selected object in the grid or list.
Click Product in the search results. Click the Preview, Columns, Data Profile, and Documentation tabs, or click the arrow to expand the bottom pane.
On the Preview tab, you see a preview of the data in the Product table.
Click the Columns tab to find details about columns (such as name and data type) in the data asset.
Click the Data Profile tab to see the profiling of data (for example: number of rows, size of data, or minimum value in a column) in the data asset.
Filter the results by using Filters on the left. For example, click Table for Object Type, and you see only the four tables, not the database.
Discover data assets with property scoping
Property scoping helps you discover data assets where the search term is matched with the specified property.
Clear the Table filter under Object Type in Filters.
In the search box, enter
tags:cyclesand press ENTER. See Data Catalog Search syntax reference for all the properties you can use for searching the data catalog.
Confirm that you see all four tables and the database (AdventureWorks2014) in the results.
Save the search
In the Searches pane in the Current Search section, enter a name for the search and click Save.
Confirm that the saved search shows up under Saved Searches.
Select one of the actions you can take on the saved search (Rename, Delete, Save As Default search).
You can broaden or narrow your search with Boolean operators.
In the search box, enter
tags:cycles AND objectType:table, and press ENTER.
Confirm that you see only tables (not the database) in the results.
Grouping with parentheses
By grouping with parentheses, you can group parts of the query to achieve logical isolation, especially along with Boolean operators.
In the search box, enter
name:product AND (tags:cycles AND objectType:table)and press ENTER.
Confirm that you see only the Product table in the search results.
With comparison operators, you can use comparisons other than equality for properties that have numeric and date data types.
In the search box, enter
Clear the Table filter under Object Type.
Confirm that you see the Product, ProductCategory, ProductDescription, and ProductPhoto tables and the AdventureWorks2014 database you registered in search results.
Annotate data assets
In this exercise, you use the Azure Data Catalog portal to annotate (add information such as descriptions, tags, or experts) data assets you have previously registered in the catalog. The annotations supplement and enhance the structural metadata extracted from the data source during registration and makes the data assets much easier to discover and understand.
In this exercise, you annotate a single data asset (ProductPhoto). You add a friendly name and description to the ProductPhoto data asset.
Go to the Azure Data Catalog home page and search with
tags:cyclesto find the data assets you have registered.
Click ProductPhoto in search results.
Enter Product images for Friendly Name and Product photos for marketing materials for the Description.
The Description helps others discover and understand why and how to use the selected data asset. You can also add more tags and view columns. Now you can try searching and filtering to discover data assets by using the descriptive metadata you’ve added to the catalog.
You can also do the following on this page:
Add experts for the data asset. Click Add in the Experts area.
Add tags at the dataset level. Click Add in the Tags area. A tag can be a user tag or a glossary tag. The Standard Edition of Data Catalog includes a business glossary that helps catalog administrators define a central business taxonomy. Catalog users can then annotate data assets with glossary terms. For more information, see How to set up the Business Glossary for Governed Tagging
Add tags at the column level. Click Add under Tags for the column you want to annotate.
Add description at the column level. Enter Description for a column. You can also view the description metadata extracted from the data source.
Add Request access information that shows users how to request access to the data asset.
Choose the Documentation tab and provide documentation for the data asset. With Azure Data Catalog documentation, you can use your data catalog as a content repository to create a complete narrative of your data assets.
You can also add an annotation to multiple data assets. For example, you can select all the data assets you registered and specify an expert for them.
Azure Data Catalog supports a crowd-sourcing approach to annotations. Any Data Catalog user can add tags (user or glossary), descriptions, and other metadata, so that any user with a perspective on a data asset and its use can have that perspective captured and available to other users.
See How to annotate data assets for detailed information about annotating data assets.
Connect to data assets
In this exercise, you open data assets in an integrated client tool (Excel) and a non-integrated tool (SQL Server Management Studio) by using connection information.
It’s important to remember that Azure Data Catalog doesn’t give you access to the actual data source—it simply makes it easier for you to discover and understand it. When you connect to a data source, the client application you choose uses your Windows credentials or prompts you for credentials as necessary. If you have not previously been granted access to the data source, you need to be granted access before you can connect.
Connect to a data asset from Excel
Select Product from search results. Click Open In on the toolbar and click Excel.
Click Open in the download pop-up window. This experience may vary depending on the browser.
In the Microsoft Excel Security Notice window, click Enable.
Keep the defaults in the Import Data dialog box and click OK.
View the data source in Excel.
In this exercise, you connected to data assets discovered by using Azure Data Catalog. With the Azure Data Catalog portal, you can connect directly by using the client applications integrated into the Open in menu. You can also connect with any application you choose by using the connection location information included in the asset metadata. For example, you can use SQL Server Management Studio to connect to the AdventureWorks2014 database to access the data in the data assets registered in this tutorial.
Open SQL Server Management Studio.
In the Connect to Server dialog box, enter the server name from the Properties pane in the Azure Data Catalog portal.
Use appropriate authentication and credentials to access the data asset. If you don't have access, use information in the Request Access field to get it.
Click View Connection Strings to view and copy ADF.NET, ODBC, and OLEDB connection strings to the clipboard for use in your application.
Manage data assets
In this step, you see how to set up security for your data assets. Data Catalog does not give users access to the data itself. The owner of the data source controls data access.
You can use Data Catalog to discover data sources and to view the metadata related to the sources registered in the catalog. There may be situations, however, where data sources should only be visible to specific users or to members of specific groups. For these scenarios, you can use Data Catalog to take ownership of registered data assets within the catalog, and to then control the visibility of the assets you own.
The management capabilities described in this exercise are available only in the Standard Edition of Azure Data Catalog, not in the Free Edition. In Azure Data Catalog, you can take ownership of data assets, add co-owners to data assets, and set the visibility of data assets.
Take ownership of data assets and restrict visibility
Go to the Azure Data Catalog home page. In the Search text box, enter
tags:cyclesand press ENTER.
Click an item in the result list and click Take Ownership on the toolbar.
In the Management section of the Properties panel, click Take Ownership.
To restrict visibility, choose Owners & These Users in the Visibility section and click Add. Enter user email addresses in the text box and press ENTER.
Remove data assets
In this exercise, you use the Azure Data Catalog portal to remove preview data from registered data assets and delete data assets from the catalog.
In Azure Data Catalog, you can delete an individual asset or delete multiple assets.
Go to the Azure Data Catalog home page.
In the Search text box, enter
tags:cyclesand click ENTER.
Select an item in the result list and click Delete on the toolbar as shown in the following image:
If you are using the list view, the check box is to the left of the item as shown in the following image:
You can also select multiple data assets and delete them as shown in the following image:
The default behavior of the catalog is to allow any user to register any data source, and to allow any user to delete any data asset that has been registered. The management capabilities included in the Standard Edition of Azure Data Catalog provide additional options for taking ownership of assets, restricting who can discover assets, and restricting who can delete assets.
In this tutorial, you explored essential capabilities of Azure Data Catalog, including registering, annotating, discovering, and managing enterprise data assets. Now that you’ve completed the tutorial, it’s time to get started. You can begin today by registering the data sources you and your team rely on, and by inviting colleagues to use the catalog.
We'd love to hear your thoughts. Choose the type you'd like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.