JSON Support in SQL Server 2016

JSON Support in SQL Server 2016

JSON support in SQL server is one of the most highly ranked requests with more than 1000 votes on the Microsoft connect site. We have announced that SQL Server 2016 will have a built-in JSON support. Note that this will not be the exact copy of native XML support that exists in SQL Server since 2005. Our goal is to create simpler but still useful framework for processing JSON documents. In this post I will give a brief overview of the JSON features that are planned for SQL Server 2016. Some rough timelines when these features will be available is:

  • SQL Server 2016 CTP2 - ability to format and export data as JSON string - you can find more information about this feature in a separate post.
  • SQL server 2016 CTP3 - ability to load JSON text in tables, extract values from JSON text, index properties in JSON text stored in columns, etc. 

JSON representation

First thing we should be aware is that built-in JSON support is not the same as the native JSON type. In SQL Server 2016, JSON will be represented as NVARCHAR type. The reasons are:

  • Migration - We found that people already store JSON as text, so they would need to change database schema and reload data to use our new features if we introduce a separate JSON type. In our implementation, developers can use JSON functionalities without any change in database.
  • Cross feature compatibility – NVARCHAR is supported in all SQL Server components, so JSON will also be supported everywhere. You can put JSON in Hekaton, temporal or column store tables, apply standard security policies including row level security, use standard B-Tree and FTS indexes, use JSON as a parameter or return value of procedures and UDFs, etc. You don’t need to think does JSON works with feature X – if NVARCHAR works with feature X then JSON will also work. Again there are some constraints – Hekaton and column store do not support LOB values so you can put only small JSON documents there. However, once we add LOB support in Hekaton and column store you will be able to store large JSON documents everywhere.
  • Client-side support – Currently we don’t have standardized JSON object type in client apps (something like XmlDom object). Web and mobile application, and JavaScript clients will naturally use JSON text and parse it with their native parsers. In JavaScript, a type that represents JSON is object. It is unlikely that they will implement some proxy of JSON type that exists only in a few RDBMS. In C#.Net, a lot of developers use JSON.Net parser with built in JObject or JArray type; however it is not a standard and probably it will not be part of ADO.NET. Even in that case we believe that C# apps will accept plain strings from database layer and parse it with their favorite parser. We are not talking only about apps. If you try to use JSON column in SSIS/SSRS, Tableau, Informatica ETL, they will still see it as text. We believe that even if we add a JSON type it will be represented as string outside SQL Server and parsed with some custom parser if needed. Therefore, we have not found any major reason to implement it as native JSON type.

Note that you can still use your own JSON type that can be implemented using CLR by importing JSON.NET library or something similar. Even if you don't like to code CLR UDTs you can find and download a lot of ready to use implementations, and you will not notice the difference between the native JSON type and JSON UDT. If this is fast enough in most of the .Net apps it might be fine for processing in SQL Server. If you believe that JSONB format from PostgreSQL or some compressed format like zipped JSON text is better option, you can parse JSON text in UDT, store it as JSONB in some binary property of CLR UTD, and create member methods that can use properties from that format. Currently we have not found that anyone even tried to create CLR UDT that encapsulate JSONB format, so we will not have that kind of experiments in this version.

Our focus will be on providing good functionalities and query optimization and not on storage. We know that PostgreSQL has a native type and JSONB support, but we still don't know is this faster or better than CLR alternatives so in this version we want to focus on the other things that are more important (do you want to see SQL Server with native type but without built-in functions that handle JSON – I don’t think so :) ). However, we are open for suggestions and if you believe the native type that can replace CLR JSON or plain text will help, you can create request on connect site so we can discuss it there. Again, our choice to start with FOR JSON and OPENJSON is the fact that these functionalities are requested in JSON connect item and probably only things that cannot be easily implemented with CLR.

So, our focus is on the export/import and some built-in functions for JSON processing. Someone might say - this will not be fast enough but we will see. We will talk about performance once we ship functionalities and improve performance if needed. However, be aware that built-in JSON parser is the fastest way to process JSON in database layer. You might use CLR type or CLR parsers as external assemblies but this will not be better than the native code that parses JSON.

Let’s see a short overview of these features in the following sections.

Exporting data as JSON - FOR JSON

First feature that will be available in SQL Server 2016 CTP2 is ability to format query results as JSON text using FOR JSON clause. If you are familiar with FOR XML clause you will easily understand FOR JSON:

 SELECT column, expression, column as alias
 FROM table1, table2, table3

When you add FOR JSON clause at the end of T-SQL SELECT query, SQL Server will take the results, format them as JSON text, and return it to client. Every row will be formatted as one JSON object, values in cells will be generated as values of JSON objects, and column names or aliases will be used as key names. We have two kinds of FOR JSON clause:

  • FOR JSON PATH that enables you to define the structure of output JSON using the column names/aliases. If you put dot-separated names in the column aliases, JSON properties will follow the naming convention.
    This feature is similar to FOR XML PATH where you can use slash separated paths.
  • FOR JSON AUTO that automatically create nested JSON sub arrays based on the table hierarchy used in the query, again similar to FOR XML AUTO.

If you have used PostgreSQL JSON functions and operators you will notice that FOR JSON clause is equivalent to JSON creation functions such as row_to_json, or json_object. The main purpose of FOR JSON is to create JSON based on variables and columns and format it according to the JSON specification, e.g.:

SET @json = ( SELECT 1 as firstKey, getdate() as dateKey, @someVar as thirdKey FOR JSON PATH)

-- Result is: { "firstKey": 1, "dateKey": "2016-06-15 11:35:21", "thirdKey" : "Content of variable" }

FOR JSON is a good choice if you want to:

  • Serialize array of objects that should be returned to the client. Imagine that you are creating JSON web service that returns suppliers and their products (e.g. using $extend option in OData services). You can search for suppliers, format each of them as JSON text and then send a separate query to get products for this supplier that will be formatted as JSON array and attached to the main supplier object. Other solution might be to join suppliers and products and format results on client side (EF will probably send separate queries). With FOR JSON clause you can join these two tables, add any predicate you want and formatting will be done in the database layer.
  • You want to de-normalize parent-child tables and put all child records as one value of parent cell formatted as JSON array. As an example you don't want separate tables for SalesOrderHeader and SalesOrderDetails - you can format order details for each header as JSON array that can be stored in single cell in SalesOrderHeader table. 

Transform JSON text to relational table - OPENJSON

OPENJSON is table-value function (TVF) that seeks into some JSON text, locate an array of JSON objects, iterate through the elements of array, and for each element generates one row in the output result. This feature will be available in CTP3. JSON text that is generated with FOR JSON clause can be transformed back to the relational form using OPENJSON. We will have the following types of OPENJSON TVF:

  • OPENJSON with predefined result schema that will enable you to define schema of the table that will be returned, as well as mapping rules that will specify what properties will be mapped to the returned columns.
  • OPENJSON without return schema where result of the TVF will be set of key-value pairs.

One example of OPENJSON function in T-SQL query is shown in the following example:

 SELECT Number, Customer, Date, Quantity
 FROM OPENJSON (@JSalestOrderDetails, '$.OrdersArray')
        Number varchar(200), 
        Date datetime,
        Customer varchar(200),
        Quantity int
 ) AS OrdersArray

 @JSalesOrderDetails is a text variable that contains an array of JSON objects in the property OrdersArray as it is shown in the following example:

'{"OrdersArray": [
   {"Number":1, "Date": "8/10/2012", "Customer": "Adventure works", "Quantity": 1200},
   {"Number":4, "Date": "5/11/2012", "Customer": "Adventure works", "Quantity": 100},
   {"Number":6, "Date": "1/3/2012", "Customer": "Adventure works", "Quantity": 250},
   {"Number":8, "Date": "12/7/2012", "Customer": "Adventure works", "Quantity": 2200}

OPENJSON will find an array in this property and return one row for each JSON object (i.e. element of the array). Four columns in the result set are defined in WITH clause. OPENJSON will try to find properties Number, Date, Customer, and Quantity in each JSON object and convert their values into columns in result set. By default, NULL will be returned is property is not found. This is the simplest example, but OPENJSON will be available to parse different formats of JSON text (e.g. nested JSON objects).

These functionalities are similar to the current OPENXML function – even the syntax is almost identical. The most important difference is the fact that OPENXML works with xml handles as an argument, and OPENJSON will directly work with JSON text.

If you are familiar with PostgreSQL JSON functions you will notice that OPENJSON is similar to json_each, json_object_keys, json_populate_record, and json_populate_recordset functions. the most important different is that we will have only two types of OPENJSON function that do the same thing as PostgerSQL.

When you will use OPENJSON?

Imagine that you are importing JSON documents in database and you want to load them in a table. Instead of parsing JSON at the client side and streaming set of columns to table, you can send JSON as-is, and parse it in database layer.

Another use case is analytic over JSON text. Since we will not introduce yet another language for JSON analysis, if you are storing JSON as text, you can transform it to a temp table and apply standard T-SQL analytic functions and operators.  

Built-in functions for JSON processing

We will also provide set of useful functions for parsing and processing JSON text. JSON built-in functions that will be available in SQL Server 2016 CTP3 are:

  • ISJSON( jsonText ) that checks is the NVARCHAR text input properly formatted according to the JSON specification. You can use this function the create check constraints on NVARCHAR columns that contain JSON text,
  • JSON_VALUE( jsonText, path ) that parses jsonText and extracts scalar value on the specified JavaScript-like path (see below some JSON path examples),

So what is the syntax of the path that will be used in built-in functions? We are using some kind of JavaScript-like syntax for referencing properties in JSON text. Some examples are:

  • '$' - references entire JSON object in the input text,
  • '$.property1' – references property1 in JSON object,
  • '$[5]' – references 5-th element in JSON array,
  • '$.property1.property2.array1[5].property3.array2[15].property4' – references complex nested property in the JSON object.

Dollar sign ($) represents the input JSON object (similar to starting / in XPath). You can add any JavaScript like property/array references after the context item to reference any nested property. One simple example of query where these built-in functions are used is: 

 SELECT t.Id, t.OrderNumber, t.OrderDate,
 JSON_VALUE(t.JOrderDetails, '$.Order.ShipDate')
FROM SalesOrderRecord AS t
WHERE ISJSON(t.JOrderDetails) > 0
 AND JSON_VALUE(t.JOrderDetails, '$.Order.Type') = 'C'

Again if we compare this with PostgreSQL you will notice that JSON_VALUE is equivalent to json_extract_path_text, ->>, or #> operators.

How to index JSON documents?

Currently we are not planning to add any custom JSON indexes. Since JSON is plain text you can use existing B-tree or full text search indexes for some advanced search query. As an example you can create computed
column using JSON_VALUE function and create standard index on that column, as it is shown in the following example:

 CREATE TABLE SalesOrderRecord (
 OrderNumber NVARCHAR(25) NOT NULL,
 JOrderDetails NVARCHAR(4000),
 Quantity AS CAST(JSON_VALUE(JOrderDetails, '$.Order.Qty') AS int),
 Price AS JSON_VALUE(JOrderDetails, '$.Order.Price'),
 ON SalesOrderRecord(Quantity)
 INCLUDE (Price);

Index will be used for any seek operation if query optimizer notices that the JSON_VALUE function is used in the query with the identical <column, path> pair. We will see in practice is this sufficient enough or we would need to implement some custom index.

So this is a short overview of new JSON functionalities. We will see more details in separate posts.