SQL Graph Architecture
Applies to: SQL Server 2017 (14.x) and later Azure SQL Database Azure SQL Managed Instance
Learn how SQL Graph is architected. Knowing the basics will make it easier to understand other SQL Graph articles.
SQL Graph Database
Users can create one graph per database. A graph is a collection of node and edge tables. Node or edge tables can be created under any schema in the database, but they all belong to one logical graph. A node table is collection of similar type of nodes. For example, a Person node table holds all the Person nodes belonging to a graph. Similarly, an edge table is a collection of similar type of edges. For example, a Friends edge table holds all the edges that connect a Person to another Person. Since nodes and edges are stored in tables, most of the operations supported on regular tables are supported on node or edge tables.
Figure 1: SQL Graph database architecture
A node table represents an entity in a graph schema. Every time a node table is created, along with the user-defined columns, an implicit
$node_id column is created, which uniquely identifies a given node in the database. The values in
$node_id are automatically generated and are a combination of
object_id of that node table and an internally generated bigint value. However, when the
$node_id column is selected, a computed value in the form of a JSON string is displayed. Also,
$node_id is a pseudo-column, that maps to an internal name with hex string in it. When you select
$node_id from the table, the column name will appear as
Using the pseudo-columns in queries is the only supported and recommended way of querying the internal
$node_id column. You should not directly use the
$node_id_<hex_string> columns in any queries.
Further, the computed JSON representation shown in the pseudo-columns, is an implementation detail. You should not take a direct dependency on the format of that JSON representation. If you must deal with this JSON representation, please consider using the NODE_ID_FROM_PARTS() and other related System Functions.
It is not recommended to directly use the graph pseudo-columns ($node_id, $from_id, $to_id) in predicates. For example, a predicate like
n.$node_id = e.$from_id should be avoided. Such comparisons tend to be inefficient, due to the conversion to the JSON representation. Instead, rely on the MATCH function as far as possible.
It is recommended that users create a unique constraint or index on the
$node_id column at the time of creation of node table, but if one is not created, a default unique, nonclustered index is automatically created. However, any index on a graph pseudo-column is created on the underlying internal columns. That is, an index created on the
$node_id column, will appear on the internal
An edge table represents a relationship in a graph. Edges are always directed and connect two nodes. An edge table enables users to model many-to-many relationships in the graph. An edge table may or may not have any user-defined attributes in it. Every time an edge table is created, along with the user-defined attributes, three implicit columns are created in the edge table:
||Uniquely identifies a given edge in the database. It is a generated column and the value is a combination of object_id of the edge table and a internally generated bigint value. However, when the
The nodes that a given edge can connect is governed by the data inserted in the
$to_id columns. In the first release, it is not possible to define constraints on the edge table, to restrict it from connecting any two type of nodes. That is, an edge can connect any two nodes in the graph, regardless of their types.
Similar to the
$node_id column, it is recommended that users create a unique index or constraint on the
$edge_id column at the time of creation of the edge table, but if one is not created, a default unique, nonclustered index is automatically created on this column. However, any index on a graph pseudo-column is created on the underlying internal columns. That is, an index created on the
$edge_id column, will appear on the internal
graph_id_<hex_string> column. It is also recommended, for OLTP scenarios, that users create an index on (
$to_id) columns, for faster lookups in the direction of the edge.
Figure 2 shows how node and edge tables are stored in the database.
Figure 2: Node and edge table representation
Use these metadata views to see attributes of a node or edge table.
The following new, bit type, columns will be added to SYS.TABLES. If
is_node is set to 1, that indicates that the table is a node table and if
is_edge is set to 1, that indicates that the table is an edge table.
|Column Name||Data Type||Description|
|is_node||bit||1 = this is a node table|
|is_edge||bit||1 = this is an edge table|
sys.columns view contains additional columns
graph_type_desc, that indicate the type of the column in node and edge tables:
|Column Name||Data Type||Description|
|graph_type||int||Internal column with a set of values. The values are between 1-8 for graph columns and NULL for others.|
|graph_type_desc||nvarchar(60)||internal column with a set of values|
The following table lists the valid values for
sys.columns also stores information about implicit columns created in node or edge tables. Following information can be retrieved from sys.columns, however, users cannot select these columns from a node or edge table.
The implicit columns in a node table are:
|Column Name||Data Type||is_hidden||Comment|
The implicit columns in an edge table are:
|Column Name||Data Type||is_hidden||Comment|
|from_obj_id_<hex_string>||INT||1||internal from node
|from_id_<hex_string>||BIGINT||1||Internal from node
|$from_id_<hex_string>||NVARCHAR||0||external from node
|to_obj_id_<hex_string>||INT||1||internal to node
|to_id_<hex_string>||BIGINT||1||Internal to node
|$to_id_<hex_string>||NVARCHAR||0||external to node
The following built-in functions are added. These will help users extract information from the generated columns. Note that, these methods will not validate the input from the user. If the user specifies an invalid
sys.node_id the method will extract the appropriate part and return it. For example, OBJECT_ID_FROM_NODE_ID will take a
$node_id as input and will return the object_id of the table, this node belongs to.
|OBJECT_ID_FROM_NODE_ID||Extract the object_id from a
|GRAPH_ID_FROM_NODE_ID||Extract the graph_id from a
|NODE_ID_FROM_PARTS||Construct a node_id from an
|GRAPH_ID_FROM_EDGE_ID||Extract identity from
Learn the Transact-SQL extensions introduced in SQL Server and Azure SQL Database, that enable creating and querying graph objects. The query language extensions help query and traverse the graph using ASCII art syntax.
Data Definition Language (DDL) statements
|CREATE TABLE||CREATE TABLE (Transact-SQL)||
|ALTER TABLE||ALTER TABLE (Transact-SQL)||Node and edge tables can be altered the same way a relational table is, using the
|CREATE INDEX||CREATE INDEX (Transact-SQL)||Users can create indexes on pseudo-columns and user-defined columns in node and edge tables. All index types are supported, including clustered and nonclustered columnstore indexes.|
|CREATE EDGE CONSTRAINTS||EDGE CONSTRAINTS (Transact-SQL)||Users can now create edge constraints on edge tables to enforce specific semantics and also maintain data integrity|
|DROP TABLE||DROP TABLE (Transact-SQL)||Node and edge tables can be dropped the same way a relational table is, using the
Data Manipulation Language (DML) statements
|INSERT||INSERT (Transact-SQL)||Inserting into a node table is no different than inserting into a relational table. The values for
|DELETE||DELETE (Transact-SQL)||Data from node or edge tables can be deleted in same way as it is deleted from relational tables. However, in this release, there are no constraints to ensure that no edges point to a deleted node and cascaded deletion of edges, upon deletion of a node is not supported. It is recommended that whenever a node is deleted, all the connecting edges to that node are also deleted, to maintain the integrity of the graph.|
|UPDATE||UPDATE (Transact-SQL)||Values in user-defined columns can be updated using the UPDATE statement. Updating the internal graph columns,
|SELECT||SELECT (Transact-SQL)||Nodes and edges are stored as tables internally, hence most of the operations supported on a table in SQL Server or Azure SQL Database are supported on the node and edge tables|
|MATCH||MATCH (Transact-SQL)||MATCH built-in is introduced to support pattern matching and traversal through the graph.|
Limitations and known issues
There are certain limitations on node and edge tables in this release:
- Local or global temporary tables cannot be node or edge tables.
- Table types and table variables cannot be declared as a node or edge table.
- Node and edge tables cannot be created as system-versioned temporal tables.
- Node and edge tables cannot be memory optimized tables.
- Users cannot update the
$to_idcolumns of an edge using UPDATE statement. To update the nodes that an edge connects, users will have to insert the new edge pointing to new nodes and delete the previous one.
- Cross database queries on graph objects are not supported.
To get started with the new syntax, see SQL Graph Database - Sample