CREATE TABLE (Azure SQL Data Warehouse)

THIS TOPIC APPLIES TO: noSQL Server noAzure SQL DatabaseyesAzure SQL Data Warehouse yesParallel Data Warehouse

Creates a new table in SQL Data Warehouse or Parallel Data Warehouse.

To understand tables and how to use them, see Tables in SQL Data Warehouse.

NOTE: Discussions about SQL Data Warehouse in this article apply to both SQL Data Warehouse and Parallel Data Warehouse unless otherwise noted.

Topic link icon Transact-SQL Syntax Conventions

Syntax

-- Syntax for Azure SQL Data Warehouse and Parallel Data Warehouse  

-- Create a new table. 
CREATE TABLE [ database_name . [ schema_name ] . | schema_name. ] table_name   
    ( 
      { column_name <data_type>  [ <column_options> ] } [ ,...n ]   
    )  
    [ WITH [ <table_option> [ ,...n ] ) ]  
[;]  

<column_options> ::=
    [ COLLATE Windows_collation_name ]  
    [ NULL | NOT NULL ] -- default is NULL  
    [ [ CONSTRAINT constraint_name ] DEFAULT constant_expression  ]

<table_option> ::= 
    {   
        CLUSTERED COLUMNSTORE INDEX --default for SQL Data Warehouse 
      | HEAP --default for Parallel Data Warehouse   
      | CLUSTERED INDEX ( { index_column_name [ ASC | DESC ] } [ ,...n ] ) -- default is ASC 
    }  
    { 
        DISTRIBUTION = HASH ( distribution_column_name ) 
      | DISTRIBUTION = ROUND_ROBIN -- default for SQL Data Warehouse
      | DISTRIBUTION = REPLICATE -- default for Parallel Data Warehouse
    }   
    | PARTITION ( partition_column_name RANGE [ LEFT | RIGHT ] -- default is LEFT  
        FOR VALUES ( [ boundary_value [,...n] ] ) )  

<data type> ::=   
      datetimeoffset [ ( n ) ]  
    | datetime2 [ ( n ) ]  
    | datetime  
    | smalldatetime  
    | date  
    | time [ ( n ) ]  
    | float [ ( n ) ]  
    | real [ ( n ) ]  
    | decimal [ ( precision [ , scale ] ) ]   
    | numeric [ ( precision [ , scale ] ) ]   
    | money  
    | smallmoney  
    | bigint  
    | int   
    | smallint  
    | tinyint  
    | bit  
    | nvarchar [ ( n | max ) ]  -- max applies only to SQL Data Warehouse 
    | nchar [ ( n ) ]  
    | varchar [ ( n | max )  ] -- max applies only to SQL Data Warehouse  
    | char [ ( n ) ]  
    | varbinary [ ( n | max ) ] -- max applies only to SQL Data Warehouse  
    | binary [ ( n ) ]  
    | uniqueidentifier  

Arguments

database_name
The name of the database that will contain the new table. The default is the current database.

schema_name
The schema for the table. Specifying schema is optional. If blank, the default schema will be used.

table_name
The name of the new table. To create a local temporary table, precede the table name with #. For explanations and guidance on temporary tables, see Temporary tables in Azure SQL Data Warehouse.

column_name
The name of a table column.

Column options

COLLATE Windows_collation_name
Specifies the collation for the expression. The collation must be one of the Windows collations supported by SQL Server. For a list of Windows collations supported by SQL Server, see Windows Collation Name (Transact-SQL).

NULL | NOT NULL
Specifies whether NULL values are allowed in the column. The default is NULL.

[ CONSTRAINT constraint_name ] DEFAULT constant_expression
Specifies the default column value.

Argument Explanation
constraint_name The optional name for the constraint. The constraint name is unique within the database. The name can be re-used in other databases.
constant_expression The default value for the column. The expression must be a literal value or a a constant. For example, these constant expressions are allowed: 'CA', 4. These are not allowed: 2+3, CURRENT_TIMESTAMP.

Table structure options

For guidance on choosing the type of table, see Indexing tables in Azure SQL Data Warehouse.

CLUSTERED COLUMNSTORE INDEX
Stores the table as a clustered columnstore index. The clustered columnstore index applies to all of the table data. This is the default for SQL Data Warehouse.

HEAP
Stores the table as a heap. This is the default for Parallel Data Warehouse.

CLUSTERED INDEX ( index_column_name [ ,...n ] )
Stores the table as a clustered index with one or more key columns. This stores the data by row. Use index_column_name to specify the name of one or more key columns in the index. For more information, see Rowstore Tables in the General Remarks.

LOCATION = USER_DB
This option is deprecated. It is syntactically accepted, but no longer required and no longer affects behavior.

Table distribution options

To understand how to choose the best distribution method and use distributed tables, see Distributing tables in Azure SQL Data Warehouse.

DISTRIBUTION = HASH ( distribution_column_name )
Assigns each row to one distribution by hashing the value stored in distribution_column_name. The algorithm is deterministic which means it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL since all rows that have NULL will be assigned to the same distribution.

DISTRIBUTION = ROUND_ROBIN
Distributes the rows evenly across all the distributions in a round-robin fashion. This is the default for SQL Data Warehouse.

DISTRIBUTION = REPLICATE
Stores one copy of the table on each Compute node. For SQL Data Warehouse the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in a SQL Server filegroup that spans the Compute node. This is the default for Parallel Data Warehouse.

Table partition options

For guidance on using table partitions, see Partitioning tables in SQL Data Warehouse.

PARTITION ( partition_column_name RANGE [ LEFT | RIGHT ] FOR VALUES ( [ boundary_value [,...n] ] ))
Creates one or more table partitions. These are horizontal table slices that allow you to perform operations on subsets of rows regardless of whether the table is stored as a heap, clustered index, or clustered columnstore index. Unlike the distribution column, table partitions do not determine the distribution where each row is stored. Instead, table partitions determine how the rows are grouped and stored within each distribution.

Argument Explanation
partition_column_name Specifies the column that SQL Data Warehouse will use to partition the rows. This column can be any data type. SQL Data Warehouse sorts the partition column values in ascending order. The low-to-high ordering goes from LEFT to RIGHT for the purpose of the RANGE specification.
RANGE LEFT Specifies the boundary value belongs to the partition on the left (lower values). The default is LEFT.
RANGE RIGHT Specifies the boundary value belongs to the partition on the right (higher values).
FOR VALUES ( boundary_value [,...n] ) Specifies the boundary values for the partition. boundary_value is a constant expression. It cannot be NULL. It must either match or be implicitly convertible to the data type of partition_column_name. It cannot be truncated during implicit conversion so that the size and scale of the value do not match the data type of partition_column_name

If you specify the PARTITION clause, but do not specify a boundary value, SQL Data Warehouse will create a partitioned table with one partition. If applicable, you an split the table into two partitions at a later time.

If you specify one boundary value, the resulting table has two partitions; one for the values lower than the boundary value and one for the values higher than the boundary value. Note that if you move a partition into a non-partitioned table, the non-partitioned table will receive the data, but will not have the partition boundaries in its metadata.

See Create a partitioned table in the Examples section.

Data types

SQL Data Warehouse supports the most commonly used data types. Below is a list of the supported data types along with their details and storage bytes. To better understand data types and how to use them, see Data types for tables in SQL Data Warehouse.

For a table of data type conversions, see the Implicit Conversions section, of CAST and CONVERT (Transact-SQL).

datetimeoffset [ ( n ) ]
The default value for n is 7.

datetime2 [ ( n ) ]
Same as datetime, except that you can specify the number of fractional seconds. The default value for n is 7.

n value Precision Scale
0 19 0
1 21 1
2 22 2
3 23 3
4 24 4
5 25 5
6 26 6
7 27 7

datetime
Stores date and time of day with 19 to 23 characters according to the Gregorian calendar. The date can contain year, month, and day. The time contains hour, minutes, seconds.As an option, you can display three digits for fractional seconds. The storage size is 8 bytes.

smalldatetime
Stores a date and a time. Storage size is 4 bytes.

date
Stores a date using a maximum of 10 characters for year, month, and day according to the Gregorian calendar. The storage size is 3 bytes. Date is stored as an integer.

time [ ( n ) ]
The default value for n is 7.

float [ ( n ) ]
Approximate number data type for use with floating point numeric data. Floating point data is approximate, which means that not all values in the data type range can be represented exactly. n specifies the number of bits used to store the mantissa of the float in scientific notation. Therefore, n dictates the precision and storage size. If n is specified, it must be a value between 1 and 53. The default value of n is 53.

n value Precision Storage size
1-24 7 digits 4 bytes
25-53 15 digits 8 bytes

SQL Data Warehouse treats n as one of two possible values. If 1<= n <= 24, n is treated as 24. If 25 <= n <= 53, n is treated as 53.

The SQL Data Warehouse float data type complies with the ISO standard for all values of n from 1 through 53. The synonym for double precision is float(53).

real [ ( n ) ]
The definition of real is the same as float. The ISO synonym for real is float(24).

decimal [ ( precision [ , scale ] ) ] | numeric [ ( precision [ , scale ] ) ]
Stores fixed precision and scale numbers.

precision
The maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point. The precision must be a value from 1 through the maximum precision of 38. The default precision is 18.

scale
The maximum number of decimal digits that can be stored to the right of the decimal point. Scale must be a value from 0 through precision. You can only specify scale if precision is specified. The default scale is 0; therefore, 0 <= scale <= precision. Maximum storage sizes vary, based on the precision.

Precision Storage bytes
1-9 5
10-19 9
20-28 13
29-38 17

money | smallmoney
Data types that represent currency values.

Data Type Storage bytes
money 8
smallmoney 4

bigint | int | smallint | tinyint
Exact-number data types that use integer data. The storage is shown in the following table.

Data Type Storage bytes
bigint 8
int 4
smallint 2
tinyint 1

bit
An integer data type that can take the value of 1, 0, or `NULL. SQL Data Warehouse optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9-16 bit columns, the columns are stored as 2 bytes, and so on.

nvarchar [ ( n | max ) ] -- max applies only to SQL Data Warehouse.
Variable-length Unicode character data. n can be a value from 1 through 4000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). Storage size in bytes is two times the number of characters entered + 2 bytes. The data entered can be 0 characters in length.

nchar [ ( n ) ]
Fixed-length Unicode character data with a length of n characters. n must be a value from 1 through 4000. The storage size is two times n bytes.

varchar [ ( n | max ) ] -- max applies only to SQL Data Warehouse.
Variable-length, non-Unicode character data with a length of n bytes. n must be a value from 1 to 8000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB).The storage size is the actual length of data entered + 2 bytes.

char [ ( n ) ]
Fixed-length, non-Unicode character data with a length of n bytes. n must be a value from 1 to 8000. The storage size is n bytes. The default for n is 1.

varbinary [ ( n | max ) ] -- max applies only to SQL Data Warehouse.
Variable-length binary data. n can be a value from 1 to 8000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of data entered + 2 bytes. The default value for n is 7.

binary [ ( n ) ]
Fixed-length binary data with a length of n bytes. n can be a value from 1 to 8000. The storage size is n bytes. The default value for n is 7.

uniqueidentifier
Is a 16-byte GUID.

Permissions

Creating a table requires permission in the db_ddladmin fixed database role, or:

  • CREATE TABLE permission on the database
  • ALTER SCHEMA permission on the schema that will contain the table.

Creating a partitioned table requires permission in the db_ddladmin fixed database role, or

  • ALTER ANY DATASPACE permission

    The login that creates a local temporary table receives CONTROL, INSERT, SELECT, and UPDATE permissions on the table.

General Remarks

For minimum and maximum limits, see SQL Data Warehouse capacity limits.

Determining the number of table partitions

Each user-defined table is divided into multiple smaller tables which are stored in separate locations called distributions. SQL Data Warehouse uses 60 distributions. In Parallel Data Warehouse, the number of distributions depends on the number of Compute nodes.

Each distribution contains all table partitions. For example, if there are 60 distributions and four table partitions, there will be 320 partitions. If the table is a clustered columnstore index, there will be one columnstore index per partition which means you will have 320 columnstore indexes.

We recommend using fewer table partitions to ensure each columnstore index has enough rows to take advantage of the benefits of columnstore indexes. For further guidance, see Partitioning tables in SQL Data Warehouse and Indexing tables in SQL Data Warehouse

Rowstore table (heap or clustered index)

A rowstore table is a table stored in row-by-row order. It is a heap or clustered index. SQL Data Warehouse creates all rowstore tables with page compression; this is not user-configurable.

Columnstore table (columnstore index)

A columnstore table is a table stored in column-by-column order. The columnstore index is the technology that manages data stored in a columnstore table. The clustered columnstore index does not affect how data are distributed; it affects how the data are stored within each distribution.

To change a rowstore table to a columnstore table, drop all existing indexes on the table and create a clustered columnstore index. For an example, see CREATE COLUMNSTORE INDEX (Transact-SQL).

For more information, see these articles:

Limitations and Restrictions

You cannot define a DEFAULT constraint on a distribution column.

Partitions

When using partitions, the partition column cannot have a Unicode-only collation. For example, the following statement fails.

CREATE TABLE t1 ( c1 varchar(20) COLLATE Divehi_90_CI_AS_KS_WS) WITH (PARTITION (c1 RANGE FOR VALUES (N'')))

If boundary_value is a literal value that must be implicitly converted to the data type in partition_column_name, a discrepancy will occur. The literal value is displayed through the SQL Data Warehouse system views, but the converted value is used for Transact-SQL operations.

Temporary tables

Global temporary tables that begin with ## are not supported.

Local temporary tables have the following limitations and restrictions:

  • They are visible only to the current session. SQL Data Warehouse drops them automatically at the end of the session. To drop them explicitlt, use the DROP TABLE statement.
  • They cannot be renamed.
  • They cannot have partitions or views.
  • Their permissions cannot be changed. GRANT, DENY, and REVOKE statements cannot be used with local temporary tables.
  • Database console commands are blocked for temporary tables.
  • If more than one local temporary table is used within a batch, each must have a unique name. If multiple sessions are running the same batch and creating the same local temporary table, SQL Data Warehouse internally appends a numeric suffix to the local temporary table name to maintain a unique name for each local temporary table.

Locking behavior

Takes an exclusive lock on the table. Takes a shared lock on the DATABASE, SCHEMA, and SCHEMARESOLUTION objects.

Examples for columns

A. Specify a column collation

In the following example, the table MyTable is created with two different column collations. By default, the column, mycolumn1, has the default collation Latin1_General_100_CI_AS_KS_WS. The column, mycolumn2 has the collation Frisian_100_CS_AS.

CREATE TABLE MyTable   
  (  
    mycolumnnn1 nvarchar,  
    mycolumn2 nvarchar COLLATE Frisian_100_CS_AS )  
WITH ( CLUSTERED COLUMNSTORE INDEX )  
;  

B. Specify a DEFAULT constraint for a column

The following example shows the syntax to specify a default value for a column. The colA column has a default constraint named constraint_colA and a default value of 0.


CREATE TABLE MyTable   
  (  
    colA int CONSTRAINT constraint_colA DEFAULT 0,  
    colB nvarchar COLLATE Frisian_100_CS_AS   
  )  
WITH ( CLUSTERED COLUMNSTORE INDEX )  
;  

Examples for temporary tables

C. Create a local temporary table

The following creates a local temporary table named #myTable. The table is specified with a 3-part name. The temporary table name starts with a #.

CREATE TABLE AdventureWorks.dbo.#myTable   
  (  
   id int NOT NULL,  
   lastName varchar(20),  
   zipCode varchar(6)  
  )  
WITH  
  (   
    DISTRIBUTION = HASH (id),  
    CLUSTERED COLUMNSTORE INDEX   
  )  
;  

Examples for table structure

D. Create a table with a clustered columnstore index

The following example creates a distributed table with a clustered columnstore index. Each distribution will be stored as a columnstore.

The clustered columnstore index does not affect how the data is distributed; data is always distributed by row. The clustered columnstore index affects how the data is stored within each distribution.


CREATE TABLE MyTable   
  (  
    colA int CONSTRAINT constraint_colA DEFAULT 0,  
    colB nvarchar COLLATE Frisian_100_CS_AS   
  )  
WITH   
  (   
    DISTRIBUTION = HASH ( colB ),  
    CLUSTERED COLUMNSTORE INDEX   
  )  
;  

Examples for table distribution

E. Create a ROUND_ROBIN table

The following example creates a ROUND_ROBIN table with three columns and without partitions. The data is spread across all distributions. The table is created with a CLUSTERED COLUMNSTORE INDEX, which gives better performance and data compression than a heap or rowstore clustered index.

CREATE TABLE myTable   
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH ( CLUSTERED COLUMNSTORE INDEX );  

F. Create a hash-distributed table

The following example creates the same table as the previous example. However, for this table, rows are distributed (on the id column) instead of randomly spread like a ROUND_ROBIN table. The table is created with a CLUSTERED COLUMNSTORE INDEX, which gives better performance and data compression than a heap or rowstore clustered index.

CREATE TABLE myTable   
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH  
  (   
    DISTRIBUTION = HASH (id),   
    CLUSTERED COLUMNSTORE INDEX  
  );  

G. Create a replicated table

The following example creates a replicated table similar to the previous examples. Replicated tables are copied in full to each Compute node. With this copy on each Compute node, data movement is reduced for queries. This example is created with a CLUSTERED INDEX, which gives better data compression than a heap and may not contain enough rows to achieve good CLUSTERED COLUMNSTORE INDEX compression.

CREATE TABLE myTable   
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH  
  (   
    DISTRIBUTION = REPLICATE,   
    CLUSTERED INDEX (lastName)  
  );  

Examples for table partitions

H. Create a partitioned table

The following example creates the same table as shown in example A, with the addition of RANGE LEFT partitioning on the id column. It specifies four partition boundary values, which results in five partitions.

CREATE TABLE myTable   
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode int)  
WITH   
  (   

    PARTITION ( id RANGE LEFT FOR VALUES (10, 20, 30, 40 )),  
    CLUSTERED COLUMNSTORE INDEX      
  )  
;  

In this example, data will be sorted into the following partitions:

  • Partition 1: col <= 10
  • Partition 2: 10 < col <= 20
  • Partition 3: 20 < col <= 30
  • Partition 4: 30 < col <= 40
  • Partition 5: 40 < col

    If this same table was partitioned RANGE RIGHT instead of RANGE LEFT (default), the data will be sorted into the following partitions:

  • Partition 1: col < 10

  • Partition 2: 10 <= col < 20
  • Partition 3: 20 <= col < 30
  • Partition 4: 30 <= col < 40
  • Partition 5: 40 <= col

I. Create a partitioned table with one partition

The following example creates a partitioned table with one partition. It does not specify any boundary values, which results in one partition.

CREATE TABLE myTable (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode int)  
WITH   
    (   
      PARTITION ( id RANGE LEFT FOR VALUES ( )),  
      CLUSTERED COLUMNSTORE INDEX  
    )  
;  

J. Create a table with date partitioning

The following example creates a new table named myTable, with partitioning on a date column. By using RANGE RIGHT and dates for the boundary values, it puts a month of data in each partition.

CREATE TABLE myTable (  
    l_orderkey      bigint,       
    l_partkey       bigint,                                             
    l_suppkey       bigint,                                           
    l_linenumber    bigint,        
    l_quantity      decimal(15,2),  
    l_extendedprice decimal(15,2),  
    l_discount      decimal(15,2),  
    l_tax           decimal(15,2),  
    l_returnflag    char(1),  
    l_linestatus    char(1),  
    l_shipdate      date,  
    l_commitdate    date,  
    l_receiptdate   date,  
    l_shipinstruct  char(25),  
    l_shipmode      char(10),  
    l_comment       varchar(44))  
WITH   
  (   
    DISTRIBUTION = HASH (l_orderkey),  
    CLUSTERED COLUMNSTORE INDEX,  
    PARTITION ( l_shipdate  RANGE RIGHT FOR VALUES   
      (  
        '1992-01-01','1992-02-01','1992-03-01','1992-04-01','1992-05-01',
        '1992-06-01','1992-07-01','1992-08-01','1992-09-01','1992-10-01',
        '1992-11-01','1992-12-01','1993-01-01','1993-02-01','1993-03-01',
        '1993-04-01','1993-05-01','1993-06-01','1993-07-01','1993-08-01',
        '1993-09-01','1993-10-01','1993-11-01','1993-12-01','1994-01-01',
        '1994-02-01','1994-03-01','1994-04-01','1994-05-01','1994-06-01',
        '1994-07-01','1994-08-01','1994-09-01','1994-10-01','1994-11-01',
        '1994-12-01'  
      ))
  );  

See also

CREATE TABLE AS SELECT (Azure SQL Data Warehouse)
DROP TABLE (Transact-SQL)
ALTER TABLE (Transact-SQL)