Snapshot Isolation (SI),is a multi-version concurrency control algorithm introduced in and later implemented by Oracle.Note that if you eliminate dirty reads, non-repeatable reads, and phantom reads, you’re NOT serializable. It turns out if you eliminate these three anomalies, what you have is a new isolation level which they called Snapshot Isolation. This is becuase there are other isolation anamolies like READ SKEW, WRITE SKEW, LOST UPDATES
Snapshot isolation reads return values from a single instant of time when all committed transactions have completed their writes and no writes of non-committed transactions are visible.
- A transaction Ti executing under SI conceptually reads data (the snapshot) from the committed state of the database as of time start(Ti)
- Holds the results of its own writes in local memory store, so if it reads data it has written it will read its own output.
- Predicates evaluated by Ti are also based on rows and index entry versions from the committed state of the database at time start(Ti), adjusted to take Ti's own writes into account.
- Snapshot isolation also obeys first commiter /updater wins rule.
- The interval in time from the start to the commit of a transaction, represented [Start(Ti), Commit(Ti)], is called its transactional lifetime.
- We say two transactions T1 and T2 are concurrent if their transactional lifetimes overlap.
- Writes by transactions active after Ti starts, i.e., writes by concurrent transactions, are not visible to Ti.
- Note that both transactions continue to write.(creating multiple versions of same data).
- When Ti is ready to commit, it obeys the First Committer Wins rule, as follows:
- Ti will successfully commit if and only if no concurrent transaction Tk has already committed writes (updates) of rows or index entries that Ti intends to write.
- So let us say Ti is planning to update the acc balance of user U1 and if another trans Tk has allready update acc balance of user U1 then Ti will fail.
- Snapshot isolation is called SERILIAZBLE isolation in oracle, the first commiter wins is enforced not via commit time validation but checks done at time of update.
- If Ti and Tk are concurrent, and Ti updates the data item X, then it will take a Write lock on X; if Tk subsequently attempts to update X while Ti is still active,Tk will be prevented by the lock on X from making further progress. If Ti then commits, Tk will abort; Tk will only be able to continue if Ti drops its lock on X by aborting.
- If, on the other hand, Ti and Tk are concurrent, and Ti updates X but then commits before Tk attempts to update X, there will be no delay due to locking, but Tk will abort immediately when it attempts to update X (the abort does not wait until Tk attempts to commit).
- One of the concurrent transactions updating a data item aborts (if data set is intersecting/overlapping then abort will happen , if the data sets are disjoint then abort will not happen). Aborts by a transaction for this reason are known as serialization errors.
- In other words the database enforces that two committed transactions which are concurrent have disjoint write sets (meaning they don't write to any of the same memory locations). Any transaction whose commit would cause this restriction to be violated is forced to abort .
Advantages of snapshot isolation
- Reading from snapshot and the fact that T sees all the changes made by transactions that commit before start(T), and it sees no changes made by transactions that commit after start(T) means partial results of other trans are not visible , this solves dirty read (reads of uncommitted transaction which could be potentially rolled back) , non repetable read and phantom reads.
- If both trans try to modify same data , one of them aborts, this also solves lost update problem.
- lets say two transactions are trying to increase current acc balance of a user by 100.
- the correct result is that acc balance should go up by 200
- trans 1 reads current account balance say X.
- trans 2 reads current account balance say X. (both reads have read there snapshots).
- trans 1 updates the account balance to X+ 100.
- trans 2 updates the account balance to X+ 100.
- trans 1 commits and becomes the first committer.
- trans 2 will fail as first commiter wins.
- hence lost update problem is solved. interestingly innodb uses Snapshot isolation but still select for update is required to solve the lostupdate problem in innodb.
- By avoiding most locking, this approach can greatly increase concurrency
Disadvantages of snapshot isolation
While In general, snapshot isolation performs all reads of data as of a particular snapshot of the database state which contains only committed data. This snapshot remains constant throughout the lifetime of the transaction, so all reads are guaranteed to be repeatable (in addition to being only of committed data). Furthermore, concurrent transactions that write the same data detect conflicts with each other and typically resolve this conflict via aborting one of the conflicting transactions. This prevents the lost-update anomaly. However, conflicts are only detected if conflicting transactions write an overlapping set of data. If the write sets are disjoint, these conflicts will not be detected. Therefore snapshot isolation is vulnerable to the write skew anomaly. Some implementations are also vulnerable to the phantom read anomaly. Here is how write skew can happen.
- Suppose X and Y are data items in different rows representing checking account balances of a married couple at a bank,with a constraint that X+Y > 0
- the bank permits either account to be overdrawn, as long as the sum of the account balances remains positive.
- Assume that initially X0 = 70 and Y0 = 80.
- Under SI, transaction T1 reads X0 and Y0, then subtracts 100 from X, assuming it is safe because the two data itemsadded up to 150.
- Transaction T2 concurrently reads X0 and Y0, then subtracts 100 from Y, assuming it is safe for the same reason.
- Each update is safe by itself and since the updated data set is not overlapping the transaction will not abort.