Hi, @Konstantin Morozov Welcome to Microsoft Q&A thanks for posting your question.
It's a correct observation that running a zero sum on Citus could result in a non-zero answer, but only under concurrent load. Once the system settles the zero sum can be taken. This is a tradeoff we took in the development of Citus to favour speed over isolation guarantees in its default operation. This is a tradeoff that we have seen work best for most customers. Note that this is only a tradeoff in the zero sum calculation, and cannot cause double spending as long as applicable locks (eg. SELECT FOR UPDATE) are taken in a transaction while validating business logic. More details are explained in the docs: https://docs.citusdata.com/en/v12.1/develop/reference_dml.html?highlight=2pc#updates-and-deletion Many of such violations can be worked around in the application. Instead of keeping a balance you can change the application to create a ledger. The exact details are too much to discuss here, but should be looked into if your application is performing accounting.
To work around these isolation issues there are two approaches that can be taken, and which one is the best depends on the application you are creating.
- Whenever you need a zero-sum you can pause the mutating workload and wait for all 2pc transactions to fully finish. Once the system is settled you can safely take the zero-sum which should reflect the state of the world.
- if pausing the mutating workload manually is impossible you can do this from the database by taking an aggressive lock on the table you are trying to zero-sum on. The effect to the application is the same, it will pause the mutating workload from the database. Your application needs to be lenient to this higher increase in latency. To take a lock like this you can use the postgres LOCK TABLE ddl command: https://www.postgresql.org/docs/current/sql-lock.html . When locking a table in SHARE mode you will prevent any concurrent mutations to that table, which allows you, once the lock is acquired, to perform a zero sum calculation.
Be aware that from the moment you start acquiring the SHARE lock on the table, no new mutations will start. Once any ongoing transactions have finished you will have acquired the lock, still preventing mutations to happen. Throught the time you hold on to the share lock all mutations will be blocked. Once you have obtained the zerosum count you should immediately release the lock by finishing the transaction.
Regards
Geetha