Hello @ever ,
Thanks for the question and using the forum.
There are two major reasons why you cannot apply a DML against external tables today:
- ADLS Gen2 is HDFS based. HDFS files are considered immutable once you have written them. This is so core to the platform, that processing engines like Spark and Hive have designed new “delta” formats such as Delta (for Parquet) and ACID (for ORC) to enable DML statements on the data. Note that these models are adding a layer to the core formats to enable this. So even if the second reason would not apply, this makes it impossible to just replace some structure or format efficiently inside a JSON file stored in HDFS. At most you can insert, if the insertion can be mapped to just adding a new file (which is not possible for all formats, see reason 2)
- Mapping a data format that is not designed to represent relational data, such as JSON, XML or images (to make an extreme example) into a relational representation is often non-reversible.
Thus, even if we have a connector for mapping arbitrary JSON for example into a table, the mapping of the JSON into the flat rows is not always reversible. Thus, these external tables will disallow updates.
You may say that your specific JSON file and mapping is reversible, e.g., the JSON is an array of flat structures, but the general framework does not know that and has to assume that the mapping may have been more complicated. For example, just adding a new row into a hierarchical JSON document cannot be done by adding a new file, since you would have to know how to identify the insertion point at a specific hierarchy location.
The solutions are:
- You update the JSON documents using transformations: Read the file, change it and write out a new version of the file.
- You use a more appropriate format for the data (e.g., Delta, Parquet, CSV/TSV) where DML may be possible.
- Combine 1 and 2 by using the more appropriate format during the change step.
Thanks & stay safe
Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members