question

moondaddy-8531 avatar image
0 Votes"
moondaddy-8531 asked MelissaMa-msft edited

Import large json file (7GB+) into sql server

I have a large json file which is irregularly structured.

*All top level elements are unique names like this:

112230-image.png

I tried this:

 SELECT *
 INTO Json1
 FROM OPENROWSET (BULK 'D:\SomeData.json', SINGLE_CLOB) as j

but get the exception: "Attempting to grow LOB beyond maximum allowed size of 2147483647 bytes."


However, this would work if I could import each of the elements you see into it's own row. I think my code above it trying to insert all data into one row, one column.

Then once each element is in a separate row I think I can read the json from there to finish the processing. Below is a sample of what the data looks like:

112323-image.png

Can someone please tell me what the sql syntax is to import this file into a table where each top level element lands in a new row?

Thank you.




sql-server-transact-sql
image.png (50.1 KiB)
image.png (57.3 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@moondaddy-8531,

To begin with, JSON was created for a web page to receive some light-weight data instead of making a roundtrip for enire web page. And look where we are now: 7GB of irregularly structured JSON file.

Is it possible for you to get the same data as a file in XML format?
It will open for you few options:

  • SSIS XML Source Adapter is streamable.

  • XSLT 3.0 is streamable.

  • LINQ to XML has streamable options.


0 Votes 0 ·
MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered moondaddy-8531 commented

Hi @moondaddy-8531,

Instead of reading an entire JSON file as a single value, you may want to parse it and return the books in the file and their properties in rows and columns.

  insert into Json1
  SELECT value
  FROM OPENROWSET (BULK 'D:\SomeData.json', SINGLE_CLOB) as j
  CROSS APPLY OPENJSON(BulkColumn)

The preceding OPENROWSET reads a single text value from the file. OPENROWSET returns the value as a BulkColumn, and passes BulkColumn to the OPENJSON function. OPENJSON iterates through the array of JSON objects in the BulkColumn array, and returns one book in each row.

The OPENJSON function can parse the JSON content and transform it into a table or a result set. The following example loads the content, parses the loaded JSON, and returns the five fields as columns:

 insert into Json1
 SELECT book.*
  FROM OPENROWSET (BULK 'C:\JSON\Books\books.json', SINGLE_CLOB) as j
  CROSS APPLY OPENJSON(BulkColumn)
  WITH( id nvarchar(100), name nvarchar(100), price float,
  pages_i int, author nvarchar(100)) AS book

You could refer more details from this link.

Best regards,
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you @melissa.

I think the problem is that your example assumes there is a data structure. As I displayed in the screenshots the data is inconsistent and the top level element is not something like "book" that has an id associated with it. Each element is a unique name. This is bad json from a Firebase database.

I'm trying to stuff all json for each top level element into a row with a single column. Once I can achieve that, then I will attempt to loop through the rows and parse the json for each row either in tsql or c#

Looking at the screenshots I provided, can you please show some sql that would work for it?

Thanks again.

0 Votes 0 ·
MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered moondaddy-8531 commented

Hi @moondaddy-8531,

Thanks for your update.

I think filestream feature may help you if you are having SQL server 2008 and on.

Or you could have a try to split the JSON file into small files, convert them to CSV files and imported these CSV files into SQL Server as mentioned in this article.

Best regards,
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks again @melissa. I think filestream will dump all the data into one row - one column. I'm looking for each element in a new row or do you know code for that?

0 Votes 0 ·

Hi @moondaddy-8531,

I'm looking for each element in a new row or do you know code for that?

Could you please provide more details or one example of expected output about above? Thanks.

Best regards,
Melissa


0 Votes 0 ·

Sure. for example, these are a few top level elements, similar to the 1st screenshot above.:

01299866-768C-4B44-99DC-76652D960D9B
01eb88cd-f633-4dd4-bdb7-2a629162e21d
username1@domain,com
username2@domain,com
03eb0e17-d870-4308-91e3-a624f5258d5f

Each of these has nested json elements as shown in the 2nd screenshot above. so for the 1st element above I want to get it's json and all children json n levels down as a single blob of json and insert it into a row/column.


This json does not have a fixed schema as its from google's Firebase "schema-less" database [thanks google :( ]
and the developer that designed it was very green and i'm guessing had little data experience.










0 Votes 0 ·
MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered MelissaMa-msft edited

Hi @moondaddy-8531,

Thanks for your update.

I did a lot of research but failed to find any effective solution using OPENROWSET ,OPENJSON or other methods in SQL Server side due to your schema of JSON was not fixed.

You could also try to use Newtonsoft.Json to read each top-level element into a JToken, convert the JToken into a string and store it in a Datatable, and then use SqlBulkCopy Class which could be able to achieve your requirement.

If you still encounter any difficulties or unintelligible errors while coding, welcome to post a new question and add C# related tag 'dotnet-csharp'.

 JsonSerializer serializer = new JsonSerializer();
 using (FileStream s = File.Open(@"C:\...\test.json", FileMode.Open))
 using (StreamReader sr = new StreamReader(s))
 using (JsonReader reader = new JsonTextReader(sr))
 {
 while (reader.Read())
 {
 // deserialize only when there's "{" character in the stream
 if (reader.TokenType == JsonToken.StartObject)
 {
 JToken jObject = JToken.Load(reader);
 }
 }
 }

Best regards,
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you @MelissaMa-msft

That sounds like it could work, but how would you implement:
""deserialize only when there's "{" character in the stream"?

And there would have to be a mechanism to both start and stop since we have 7gb of text to deal with.

0 Votes 0 ·

Hi @moondaddy-8531,

Thanks for your update.

""deserialize only when there's "{" character in the stream"?

Above is more likely one question related with C#.

It is recommended to post a new question with all details and add C# related tag 'dotnet-csharp' and you will get more professional help from many experts there.

By the way, in order to close this thread, please kindly accept the answers if they helped. By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

Thank you for understanding!

Best regards,
Melissa

0 Votes 0 ·