Analyze data with a serverless SQL pool
In this tutorial, you'll learn how to analyze data with serverless SQL pool.
The Built-in serverless SQL pool
Serverless SQL pools let you use SQL without having to reserve capacity. Billing for a serverless SQL pool is based on the amount of data processed to run the query and not the number of nodes used to run the query.
Every workspace comes with a pre-configured serverless SQL pool called Built-in.
Analyze NYC Taxi data with a serverless SQL pool
Note
Make sure you have placed the sample data into the primary storage account
In Synapse Studio, go to the Develop hub
Create a new SQL script.
Paste the following code into the script.
SELECT TOP 100 * FROM OPENROWSET( BULK 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet', FORMAT='PARQUET' ) AS [result]Click Run.
Data exploration is just a simplified scenario where you can understand the basic characteristics of your data. Learn more about data exploration and analysis in this tutorial.
Create data exploration database
You can browse the content of the files directly via master database. For some simple data exploration scenarios, you don't need to create a separate database.
However, as you continue data exploration, you might want to create some utility objects, such as:
- External data sources that represent the named references for storage accounts.
- Database scoped credentials that enable you to specify how to authenticate to external data source.
- Database users with the permissions to access some data sources or database objects.
- Utility views, procedures, and functions that you can use in the queries.
Use the
masterdatabase to create a separate database for custom database objects. Custom database objects, cannot be created in themasterdatabase.CREATE DATABASE DataExplorationDB COLLATE Latin1_General_100_BIN2_UTF8Important
Use a collation with
_UTF8suffix to ensure that UTF-8 text is properly converted toVARCHARcolumns.Latin1_General_100_BIN2_UTF8provides the best performance in the queries that read data from Parquet files and cosmos Db containers.Switch to
DataExplorationDBwhere you can create utility objects such as credentials and data sources.CREATE EXTERNAL DATA SOURCE ContosoLake WITH ( LOCATION = 'https://contosolake.dfs.core.windows.net')Note
An external data source can be created without a credential. If a credential does not exist, the caller's identity will be used to access the external data source.
Optionally, use the 'master' database to create a login for a user in
DataExplorationDBthat will access external data:CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';Then create a database user in
DataExplorationDBfor the login and grant theADMINISTER DATABASE BULK OPERATIONSpermission.CREATE USER data_explorer FOR LOGIN data_explorer; GO GRANT ADMINISTER DATABASE BULK OPERATIONS TO data_explorer; GOExplore the content of the file using the relative path and the data source:
SELECT TOP 100 * FROM OPENROWSET( BULK '/users/NYCTripSmall.parquet', DATA_SOURCE = 'ContosoLake', FORMAT='PARQUET' ) AS [result]
Data exploration database is just a simple placeholder where you can store your utility objects. Synapse SQL pool enables you to do much more and create a Logical Data Warehouse - a relational layer built on top of Azure data sources. Learn more about building Logical Data Warehouse in this tutorial.