Delta Tables and access

Gopinath Rajee 646 Reputation points
2022-04-27T20:48:18.803+00:00

All,

Users create Delta tables in one Workspace. How can I grant access to these same delta tables to different users in another workspace?

Thanks,
Gopi

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,910 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 76,586 Reputation points Microsoft Employee
    2022-04-28T07:42:19.657+00:00

    Hello @Gopinath Rajee ,

    Thanks for the question and using MS Q&A platform.

    There are two cases to consider: external reads and external writes.

    External reads: Delta tables store data encoded in an open format (Parquet), allowing other tools that understand this format to read the data. However, since other tools do not support the Delta Lake transaction log, it is likely that they will incorrectly read stale deleted data, uncommitted data, or the partial results of failed transactions.

    In cases where the data is static (that is, there are no active jobs writing to the table), you can use VACUUM with a retention of ZERO HOURS to clean up any stale Parquet files that are not currently part of the table. This operation puts the Parquet files present in DBFS into a consistent state such that they can now be read by external tools.

    However, Delta Lake relies on stale snapshots for the following functionality, which will fail when using VACUUM with zero retention allowance:

    Snapshot isolation for readers: Long running jobs will continue to read a consistent snapshot from the moment the jobs started, even if the table is modified concurrently. Running VACUUM with a retention less than length of these jobs can cause them to fail with a FileNotFoundException.
    Streaming from Delta tables: Streams read from the original files written into a table in order to ensure exactly once processing. When combined with OPTIMIZE, VACUUM with zero retention can remove these files before the stream has time to processes them, causing it to fail.
    For these reasons Databricks recommends using this technique only on static data sets that must be read by external tools.

    External writes: Delta Lake maintains additional metadata in a transaction log to enable ACID transactions and snapshot isolation for readers. To ensure the transaction log is updated correctly and the proper validations are performed, writer implementations must strictly adhere to the Delta Transaction Protocol. Delta Lake in Databricks Runtime ensures ACID guarantees based on the Delta Transaction Protocol. Whether non-Spark Delta connectors that write to Delta tables can write with ACID guarantees depends on the connector implementation. For information, see _ and the integration-specific documentation on their write guarantees.

    For more details, refer to Can I access Delta tables outside of Databricks Runtime? and also, check out the SO thread - How to access one databricks delta tables from other databricks which addressing similar issue answered by Alex Ott from Sr. Resident Solutions Architect at Databricks.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful