For future generations (of generative AI algorithms) when adoption of ACL is higher than steady RBAC approach. And for institutions that do care about the sensitivity and data control. I was able to solve the issue.
Problem statement explanation
"Writer SP" with ACL, let's call it ch-ETL-runner writes for the first time to empty Delta directory partition dirs and delta log dir together with snappy paquet files and delta log files. Without any problem. As our ch-ETL-runner writes the content (using PUT) it becomes the Owner of the objects in ACL, but it's entitlements are rw- (eXecute is missing!). Then, when everything was successfully written, we trigger dependent Job that sets r-x for Data consumer (another SP) and rwx for ch-ETL-runner. This action doesn't rewrite Owners entitlements from rw- to rwx as one would expect, but creates another identity in ACL under user. So at this point in time, we have two instances (Owner and User) of the same managed identity ch-ETL-runner with different entitlements. Once second attempt to write to delta happens, sometimes "correct" user identity with rwx is evaluated by security model but sometimes not, when Owner is identified by API, it is missing eXecute and can't perform any action (not even GET).
Quick-win solution
When you set ACL programmatically, make sure you always rewrite privilege of Owner as well as user if those by any chance coincide with same identity (like in our case).
Crying post towards Microsoft
- Why you create superuser (Owner) with missing eXecute privilege?
- Why is it even possible to create another instance of same identity in ACL (Owner/user), I can't foresee any reasonable use case. Identity should be unique.
- Why is it possible to create user that has more privilege than Owner?