I saw the recent release of the Dataflow "Quick re-use" feature and immediately enabled it (the start-up time of pre-warmed clusters had been something that had been an issue for us for age)
Unfortunately, I think this feature is buggy.
Ever since enabling this setting, I've been intermittently having activity spontaneously hang and then error with:
"The databricks job of Dataflow completed, but the runtime state is either null or still InProgress."
It's happened on a variety of different dataflows, with nothing in common other than "they all use the same IR, which now has 'Quick re-use' enabled".
In case it's of use, one such run would be:
Pipeline Run Id: 0198122b-6bb1-4250-9fcc-3d28417366e7
Activity Run Id: 1141af06-376a-4554-97da-c36a79278113
"dataFlowETag": "0e006849-0000-0c00-0000-6065e9da0000"
But I've had numerous other instances of it happening.