Parsing 4GB JSON with SQL Server
SQL Server 2016 and Azure SQL Database enable you to parse JSON text and transform it into tabular format. In this post, you might see that JSON functions can handle very large JSON text - up to 4GB.
First, I would need very large JSON document. I'm using TPCH database so I will export the content of lineitem table in a file. JSON can be exported using the bcp.exe program:
D:\Temp>bcp "select (select * from lineitem for json path)" queryout lineitems.json -d tpch -S .\SQLEXPRESS -T -w Starting copy... 1 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 103438 Average : (0.01 rows per sec.)
The query will format all rows from lineitem table, format them as JSON text and return them as a single cell. I'm using Unicode format (-w flag). As a result, bcp.exe will generate 4.35 GB (4,677,494,824 bytes) file containing one big JSON array.
Now I will load the content of this file using OPENROWSET(BULK) and pass content to OPENJSON function that will parse it, take the values from l_discount key, and find the average value:
select avg([l_discount]) from openrowset(bulk 'D:\Temp\lineitems.json', SINGLE_NCLOB) f cross apply openjson(f.BulkColumn) with([l_discount] [money])
In my SQL Server 2016 Express edition this query is finished in 1min 53 sec.
Functions that can parse JSON in SQL Server 2016 do not have any constraint regarding the size of JSON document. As you might see in this example, I can successfully parse 4GB JSON document, which is 2x bigger than maximum size of NVARCHAR(MAX) that can be stored in tables.