Concurrency Control mechanisms
2 phase locking/pessimistic concurrency control and optimistic concurrency control are two techniques used by databases for concurrency control to handle various anamolies which can arise during concurrent execution of transactions. Avoiding conflicts (two-phase locking) requires locking to control access to shared resources. Detecting conflicts (e.g. Optimistic locking/Multi-Version Concurrency Control) provides better concurrency at the expense of relaxing serializability )
- 2 phase locking / pessimistic concurrency control is the oldest concurrency control technique that gurantees serializable and linearizable transaction schedule. This approach is called pessimistic because it assumes that another transaction might change the data between the read and the update. Pessimistic concurrency control should be used when contention is expected to be high ie most of the times transactions will be interfering with each other. 2 phase locking uses shared locks for reading and exclusive locks for writing.
- Exclusive or write lock(X)
- If a transaction gets an exclusive lock on a target (table / row) then the target cannot be read or modified by other transactions. A transaction typically will acquire exclusive lock in case of DML statements DELETE, INSERT and UPDATE. If another transaction wants to acquire exclusive lock, it will have to wait as an exclusive lock can be imposed to a page or row only if there is no other shared or exclusive lock imposed already on the target. Hence uncommitted writes of a transaction cannot be overwritten by another transaction, eg if account balance of a user update by a transaction, this uncommitted write cannot be overwritten by another transaction. ie dirty writes can be blocked via locks.
- What is locked on update/delete? In innodb update / delete sets record level locks on every index record that is scanned in the processing of query. (In case table scan also all scanned rows are locked) It does not matter whether there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. Note that during update/insert along with record level locks "gaps are also locked" via gap level locks "OR" next-key lock to block phantom rows. In inno db gap locks are only used in REPETABLE READ and SERIALIZABLE isolation levels. (assume an employee table has employees where points column has value 300,400,500,600, then the following query , delete * from employee where points = 550 would block the range 500-600 as this is the gap where points = 550 can be potentially inserted. If this gap is not locked, even if the transaction commits , at the end of transactions rows with points=550 can exist. Innodb calls combination of record lock + gap lock (gap before index record) is called next-key lock. gap locks on various other factors like whether index is unique. More on this later. You can read more here https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html , https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html
- what is locked by insert
- INSERT sets an exclusive lock on the inserted row. This lock is an index-record lock, not a next-key lock (that is, there is no gap lock) and does not prevent other sessions from inserting into the gap before the inserted row.
-
In many databaseses (for instance SQL Server) under all isolation levels (Read Uncommitted, Read Committed , Repeatable Reads, Serializable) exclusive Locks are acquired for Write operations. (ie write write conflicts are blocked by db via exclusive locks ).Oracle uses locks to enforce first update wins) The difference between the isolation levels refers to the way in which Shared (Read) Locks are acquired/released and the way gap locks are handled.
- The write lock has a higher priority than read lock. When a resource is unlocked, and if there are lock requests waiting in the queue, then the lock is granted to first request in the write lock queue.If there is no request in write lock queue , then lock is granted to read request.
- Note that SELECT FOR UPDATE is a way to get exclusive lock on read rows (irrespective of transactions isolation level,setting a different isolation level doesn't allow you to get around locking)
- Locking can be achived via selecting suitable isolation level or for more fine grained control explict locking can be used. Typically lower isolation levels are selected with explicit locking strategy.
- In transaction level SERIALIZABLE select should acquire exclusive lock on select , but some databases need SELECT FOR UPDATE, ie the only portable way(across different RDMBS) to achieve exclusive locking during reads is with SELECT FOR UPDATE. In some systems locking is a side effect of concurrency control and you achieve the same results without specifying FOR UPDATE explicitly.
- MyISAM in MySQL (and several other old systems) does lock the whole table for the duration of a query.
- In databases which use MVCC (like Oracle, PostgreSQL, MySQL with InnoDB), a DML query creates a copy of the record (in one or another way) and generally readers do not block writers and vice versa. For these databases, a SELECT FOR UPDATE would be required (for instance in MySql select for update is required to avoid lost update problem in REPETABLE READ ISOLATION LEVEL where MVCC is used). InnoDB also supports Multi-Version Concurrency Control (MVCC), hence it allows different snapshots of the data to be available to different transactions. This mechanism is used to implement Read Committed and Repeatable Reads isolation levels. (SELECT ... FROM is a consistent read, reading a snapshot of the database and setting no locks, this snapshot read by transaction is not afftected by other writes hence dirty read and non repetable read problem is solved) However, if we have the transaction isolation level set to Serializable, then the engine uses a pessimistic locking mechanism, traditionally implemented using the two-phase locking protocol.
- On commit or rollback the lock will be released(regardless of isolation level)
- If a transaction gets an exclusive lock on a target (table / row) then the target cannot be read or modified by other transactions. A transaction typically will acquire exclusive lock in case of DML statements DELETE, INSERT and UPDATE. If another transaction wants to acquire exclusive lock, it will have to wait as an exclusive lock can be imposed to a page or row only if there is no other shared or exclusive lock imposed already on the target. Hence uncommitted writes of a transaction cannot be overwritten by another transaction, eg if account balance of a user update by a transaction, this uncommitted write cannot be overwritten by another transaction. ie dirty writes can be blocked via locks.
- Shared or read lock(S)
- In a nutshell a shared lock blocks writers but allows other readers to acquire the same shared lock.
- When a transaction acquires a shared lock over a target, the target can be read by other transactions (which also accuire shared lock on target) but cannot be modified . However, a shared lock can be imposed by several transactions at the same time over the same target (eg row).In this way several transactions can share the ability for data reading. Note that if multiple transactions have a shared lock over target, none of them can modify the target.
- If a transaction cannot acquire shared lock over target (as another transaction has acquired exclusive lock over target ) then it will wait.
- In database like PostgresSQL , FOR SHARE can be used for shared/read lock.
- When the shared lock is released depends on the isolation level , for example in ms-sql
- In READ COMMITTED isolation level , shared locks are released as soon as SELECT query(read) execution completes
- In the REPEATABLE READ and SERIALIZABLE isolation level the transaction holds a shared lock until the end of the transaction.
- Gap/range/predicate locks are used to block phantom reads typically in SERIALIZABLE isolation.REPETABLE READ would typically use row locks .Note that SELECT FOR UPDATE would lead to gap locks(irrespective of isolation levels) .(infact SERILIZABLE isolation level gap lock would be using select for update) Gap locks are used to lock "ALL THE GAPS" in index where potentially records can be inserted. If a lock is held on a gap, no other statement is allowed to INSERT, UPDATE or DELETE a record that falls within that gap. How locking happens depends on
- predicate type (equality/range)
- if index is unique
- if key value exists
- for example in ms-sql server here is how range lock would operate . Assume that there is employee table with unique key id with value (1,2,3,4,5) and points with non unqiue index with values 300,400,500,600,700.
- EQUALITY PREDICATE
- If key value exists
- NON UNIQUE INDEX
- Range lock is taken on the requested key and on the next key (if next key does not exist it will be infinity )
- eg select * from employee where points = 500 for update
- let us say there are records with points 300,400,500,600,700. new inserts with points = 500 can happen in the gap between(500 to 600) .Here 600 is the next key. Hence this gap must be blocked.
- let us say there are records with points 300,400,500 new inserts with points = 500 can happen in the gap between (500 to infinity). next key is infinity.
- UNIQUE INDEX
- In case of eqality predicate + unqiue index + key found range lock is not taken.
- select * from employee where id = 1 (as id=1 exists , no range lock will be taken)
- NON UNIQUE INDEX
- If key value does not exist
- NON UNIQUE INDEX
- range lock will be taken on next key for both unqiue and non unique index
- UNIQUE INDEX
- range lock will be taken on next key for both unqiue and non unique index
- NON UNIQUE INDEX
- If key value exists
- RANGE PREDICATE
- Unique index
- ‘range lock on all the key values in the range when using ‘between’ and range’ lock on the ‘next’ key that is outside the range.
- Non Unique index
- same as unique index ‘range lock on all the key values in the range when using ‘between’ and range’ lock on the ‘next’ key that is outside the range. read more here https://techcommunity.microsoft.com/t5/sql-server/range-locks/ba-p/383080 and here https://stackoverflow.com/questions/42545523/mysql-gap-lock-reasoning
- Unique index
- EQUALITY PREDICATE
- Gap lock examples.
- SELECT * FROM test_table WHERE id > 100 FOR UPDATE;
- assume that id has non unqiue index and id values present are 90,102
- after this query is fired , if the query is refired then then gaps to be blocked are between 90 -102 (as 101 could be inserted ) and 102-infinity
- delete from test_table where c = 1000; (table is empty)
- if this query is executed the gap would be (-infinity to infinity) which would be locked as if the query (select * from test_table where c = 1000) is run the expectation would be that now rows are returned hence whole range has to be locked.
- select * from testdl where c = 500 for update; (c = 1000 exists )
- it has to be made sure that no new records where c=500 appear if the query is made again.so all necessary gaps need to be locked. the gaps are (-infinity to 1000) and (1000 to infinity). Obviously new records where c = 500 won't be inserted into the second gap but they will be inserted into the first gap, so we have to lock it.
- It is imporant to note that lock is always attached to exissting objects.
- SELECT * FROM test_table WHERE id > 100 FOR UPDATE;
- Intention Locks
- Many databases support intention locks. Database objects are hierarchical in nature, a logical tablespace being mapped to multipledatabase files, which are built of data pages, each page containing multiple rows. For thisreason, locks can be acquired on different database object types. InnoDB also permits table locking, and to allow locking at both table and row level to co-exist gracefully, a series of locks called intention locks exist.
- An intention shared lock(IS) indicates that a transaction intends to set a shared lock.
- An intention exclusive lock(IX) indicates that a transaction intends to set an exclusive lock.
- Whether a lock is granted or not can be summarised as follows:
- An X lock is not granted if any other lock (X, S, IX, IS) is held.
- An S lock is not granted if an X or IX lock is held. It is granted if an S or IS lock is held.
- An IX lock is not granted if in X or S lock is held. It is granted if an IX or IS lock is held.
- An IS lock is not granted if an X lock is held. It is granted if an S, IX or IS lock is held.
- Many databases support intention locks. Database objects are hierarchical in nature, a logical tablespace being mapped to multipledatabase files, which are built of data pages, each page containing multiple rows. For thisreason, locks can be acquired on different database object types. InnoDB also permits table locking, and to allow locking at both table and row level to co-exist gracefully, a series of locks called intention locks exist.
- AUTO_INCREMENT Handling : Here is how auto increment is handled in innodb.
- innodb_autoinc_lock_mode = 0 (“traditional” lock mode)
- In this lock mode, all “INSERT-like” statements obtain a special table-level AUTO-INC lock for inserts into tables with AUTO_INCREMENT columns. This lock is normally held to the end of the statement (not to the end of the transaction) to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of INSERT statements, and to ensure that auto-increment values assigned by any given statement are consecutive.
- innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)
- “Simple inserts” (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes. read more here https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html
- Points to note
- Auto increment values are assigned sequentially. In the event there is a rollback or deadlock you will find gaps in the auto inc value.
- Auto increment locks can and do impact concurrency and scalability of your database.
- If there is need to scale writes beyond a single server and end up having to shard auto increment no longer produces unique keys.
- Read more here https://blog.pythian.com/case-auto-increment-mysql/
- innodb_autoinc_lock_mode = 0 (“traditional” lock mode)
- Pessimistic concurrency control has problem of Lockout.- An application user selects a record for update, and then leaves for lunch without finishing or aborting the transaction. All other users that need to update that record are forced to wait until the user returns and completes the transaction, or until the DBA kills the offending transaction and releases the lock.
- Pessimistic concurrency control has problem of Deadlock - Users A and B are both updating the database at the same time. User A locks a record and then attempt to acquire a lock held by user B - who is waiting to obtain a lock held by user A. Both transactions go into an infinite wait state - the so-called deadly embrace or deadlock. Note that preserving lock orders becomes responsibility of data access layer. Detailed lock tree hierarchies - through which developers are required to obtain locks in a pre-definedorder can resolve many of these problems but this requires the Database Architect to identify all thepossible transactions a new system will be required to make and ensure that they are catered for in thelock tree, as well as ensuring that all members of the development team are disciplined in following thetree. Even if the Database Architect correctly identifies all the transactions and the Developers faithfullyfollow the lock tree, future releases of the software may add new transactions that cannot be easilyintegrated into the lock tree.
- Exclusive or write lock(X)
- Optimistic concurrency control(OCC)
- OCC proceeds on the assumption that the data being updated has not changed since the read. It does not lock records when they are read. Since no locks are taken out during the read, deadlocks are all but eliminated and performance improves since transactions do not have to wait on each other’s locks. Lock management over head is eliminated. Optimistic locking is a useful approach when concurrent record updates are expected to be infrequent or the locking overhead is high. Hence while in pessimistic control you try to avoid the conflict by locking , in OCC, you allow the conflict to occur, but try to detect it upon committing your transactions . OCC assumes that multiple transactions can frequently complete without interfering with each other. At its most basic a simplfied explanation is
- Every record created has a version number/timestamp/checksum.
- Each successful update leads to a new version for the record making older version numbers obsolete.
- A record is read along with its version number.
- If updated is attempted with obsolete version number then it means that the assumption that data being updated has not changed since read is incorrect and the transaction must be rolled back. If there is too much contention/collisions then there will be lot of roll backs hence OCC is sutiable when data contention is less.
- Note that it is the database that updates the time-stamp - not the application.
- Optimistic locking is useful approach in microservices as in microservices architecture , in sagas pattern the transaction isolation is lost.(the reads and writes are not part of the same database transaction)
Img reference :https://i.stack.imgur.com/DEdlF.png- Hence Optimistic concurrency control can solve lost update problem.
- A important point is that two transactions if both execute the update statement with same incremented version number in where clause since this update operation , a exclusive lock would be required and these would run sequentially and hence one would be successful in updating and for the other no rows will be updated. Read more here. https://stackoverflow.com/questions/33254755/atomicity-of-optimistic-locking. Note that the query SET amount = amount - 40 and version=2 is updating without reading. This is the right way to do it rather than reading to check if the row version has changed. Also check https://dba.stackexchange.com/questions/77655/should-an-insert-cause-an-exclusive-lock-on-a-foreign-key.
- MVCC is an optimistic concurrency control mechanism. The reads and writes are isolated from each other (without locking) ie reading does not block writing and writing does not block reading.Updates do not lead to overwrite,new version of data is created .MVCC uses snapshot isolation . In snapshot isolation transaction reads a personal snapshot of the database, taken at the start of the transaction(subsequent reads read the snapshot established by the first read) (hence typically resolving dirty read, non repeatable read and phantom read) When the transaction concludes, it will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.(hence resolving lost updates).
- eg in MySql InnoDB , SELECT ... FROM is a consistent read, reading a personal snapshot of the database taken at the start of transaction and setting no locks .
- When an MVCC database needs to update a piece of data, it will not overwrite the original data item with new data, but instead creates a newer version of the data item. Thus there are multiple versions stored. As every database object maintains its own version history the rollback process becomes much lighter as it only requires to switch from one version to the other.
- InnoDB also supports Multi-Version Concurrency Control (MVCC), hence it allows different snapshots of the data to be available to different transactions. This mechanism is used to implement Read Committed and Repeatable Reads isolation levels. (SELECT ... FROM is a consistent read, reading a snapshot of the database and setting no locks, this snapshot read by transaction is not afftected by other writes hence dirty read and non repetable read problem is solved, yet it would not solve lost update problem) However, if we have the transaction isolation level set to Serializable, then the engine uses a pessimistic locking mechanism, traditionally implemented using the two-phase locking protocol.(Hence in InnoDB you don't need a SELECT FOR UPDATE in serializable mode, but do need them in REPEATABLE READ or READ COMMITED if you want to exclusive lock on rows ).
- Oracle uses Snapshot Isolation (SI), a multi-version concurrency control algorithm .
- Oracle also supports explicit locking as well, using the SELECT FOR UPDATE SQL syntax.
- Innobdb uses MVCC in Read Committed and Repeatable Reads
- By default SQL Server uses locks for implementing all the isolation levels stipulated by theSQL standard.
- Snapshot isolation has to be explicitly turned on if required.
- PostgreSQL like Oracle, uses the MVCC data access model, and it does not offer a 2PL transaction isolation implementation at all.
- Although PostgreSQL is seen as a pure MVCC model, locking is still required to prevent write writeconflicts or for explicit locking. SELECT FOR UPDATE is used to acquire an exclusive rowlevellock, while SELECT FOR SHARE is for applying a shared lock instead.
- OCC proceeds on the assumption that the data being updated has not changed since the read. It does not lock records when they are read. Since no locks are taken out during the read, deadlocks are all but eliminated and performance improves since transactions do not have to wait on each other’s locks. Lock management over head is eliminated. Optimistic locking is a useful approach when concurrent record updates are expected to be infrequent or the locking overhead is high. Hence while in pessimistic control you try to avoid the conflict by locking , in OCC, you allow the conflict to occur, but try to detect it upon committing your transactions . OCC assumes that multiple transactions can frequently complete without interfering with each other. At its most basic a simplfied explanation is
Isolation levels
DB isolation levels provide the user of a system the ability to trade off isolation guarantees for improved performance.Isolation levels described by the SQL:1992 standard are READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The default isolation varies in different RDBMS .The default isolation level for InnoDB is REPEATABLE READ. The default isolation level for MySql InnoDB is REPEATABLE READ. Note that the way ANSI thinks about it, if you eliminate dirty reads, non-repeatable reads, and phantom reads, you’re serializable. This is wrong. It turns out if you eliminate these three anomalies, what you have is a new isolation level which is called Snapshot Isolation which is commonly used . (Infact oracle calls its implementation of snapshot isolation serializable ) While the most db will handle dirty reads, non-repeatable reads, and phantom reads as described in sql standard, handling of other anamolies like lost updates , write skew varies from database to database. A major problem is that the standard does not define, nor provide correctness constraints on one of the most popular reduced isolation levels used in practice: snapshot isolation (nor any of its many variants—PSI, NMSI, Read Atomic, etc). By failing to provide a definition of snapshot isolation, differences in concurrency vulnerabilities allowed by snapshot isolation have emerged across systems. For instance Snapshot isolation typically should stop lost updates (when using first update/first commiter wins) but this is not the case in innodb.
- READ UNCOMMITTED
- READ COMMITTED
- Dirty reads are blocked.(Queries only see data committed before transaction began)
- non repetable reads, lost updates , pahontom reads are NOT handled at this isolation level.
- This isolation level is the default for (Oracle,SQL Server or PostgresSQL)
- When 2 phase locking is used, typically writes use exclusive locks.(If two transactions try to update the same row, the second transaction must wait for the first one to either commit or rollback) .
- Some databases like InnoDB uses MVCC in READ COMMITTED isolation level.
- REPEATABLE READ
- Blocks dirty reads ,non repetable reads and Lost updates.
- Does not handle phantom reads. Typically when 2PL is used then row locks will be used rather than range locks. Hence phantom reads would not be blocked.
- (There can be exceptions, eg in innodb , phantom rows are blocked by repetable read)
- How repetable read is handled in INNODB
- This is the default isolation level for InnoDB.
- Some databases like InnoDB uses MVCC in REPETABLE READ isolation level
- A transaction using REPEATABLE READ will perform reads as if they were run at the same point in time (“snapshotted”) as the very first read in the transaction. This allows a consistent view of the database across queries without running into “phantom rows”: records that appear or disappear on subsequent reads.Similary dirty reads and non repetable reads are also blocked.
- SERIALIZABLE
- Blocks dirty reads ,non repetable reads ,lost updates and phantom reads .
- InnoDB uses pessmistic locking in this isolation level.
Handling isolation anamolies via 2 phase locking
Note that in different RDBMS there may be some variation in way 2 phase locking is implemented.
In brief the anamolies defined by sql standard are
-
- Dirty reads, prevented by Read Committed, Repeatable Read and Serializable isolation levels
- Non-repeatable/fuzzy reads, prevented by Repeatable Read and Serializable isolation levels
- Phantom reads, prevented by the Serializable isolation level
Following anamolies are not covered by SQL standard and are handling by various databases is non standard.
- dirty write
- read skew
- write skew
- lost update
- Write followed by read or write (Transaction 1 writes and transaction 2 tries to read or modify written data)
- Uncommitted data written by transaction 1 is read by transaction 2.This is called DIRTY READ
- Dirty read can happen in isolation level READ UNCOMMITTED .
- Typically in 2Phase locking, trans 1/writer will acquire an exclusive lock on written record (irrespective of isolation level).
- Trans 2/ reader will NOT try to get shared lock on data, hence can read the dirty/uncommitted data.
- Dirty read is blocked in READ COMMITTED, REPEATABLE READ, and SERIALIZABLE isolation levels.
- Typically in 2Phase locking, trans 1/writer will acquire an exclusive lock on written record (irrespective of isolation level).
- Trans 2/ reader will try to get shared lock on data which will fail as trans 1 has exclusive lock.
- Dirty read can happen in isolation level READ UNCOMMITTED .
- Uncommitted data written by transaction 1 is modified by transaction 2. In other words a dirty write happens when two concurrent transactions are allowed to modify the same row at the same time.Although the SQL standard does not mention this phenomenon, even the lowest isolation level (Read Uncommitted) can prevent it.
- trans 1 will have exclusive lock on data (in all isolation levels) and trans 2 will fail to get exclusive lock. Hence this is blocked in all isolation levels.
- Uncommitted data written by transaction 1 is read by transaction 2.This is called DIRTY READ
- Read followed by write or read (Transaction 1 reads data and transaction 2 modifies or reads data )
- Transaction 1 reads data and transaction 2 modifies the data read by trans 1.
- this can happen in isolation level READ UNCOMMITTED as
- trans 1 that reads data does not acquired shared lock
- trans 2 can acquire exclusive lock on data and modify it.
- this can happen in isolation level READ COMMITTED as
- trans 1 that reads data acquires shared lock while reading but releases the lock as soon as read is over
- trans 2 can acquire exclusive lock on data and modify it.
- this cannnot happen in isolation level READ REPETABLE and SERIALIZABLE as
- trans 1 that reads data acquires shared lock while reading and holds it till the end of transaction.
- trans 2 will be unable to get exclusive lock as trans 1 is holding shared lock.
- this can happen in isolation level READ UNCOMMITTED as
- Transaction 1 reads data and transaction 2 reads the same data
- If two transactions read the same data then it can lead to lost or buried update problem.
- Let us say two transactions both want to increase a certain value by 1 . (from current value of 10)
- trans 1 will acquired shared lock and read value as 10.
- trans 2 will acquired shared lock and read value as 10.
- trans 1 will increase the counter to 11 and update.
- trans 2 will increase the counter to 11 and update hence leading to a lost / buried update.
- In READ UNCOMMITTED this can happen because read will not read to shared lock.
- In READ COMMITTED this can happen as although shared lock is acquired, the shared lock is released as soon as select is over.
- In READ REPETABLE and SERIALIZABLE this will NOT happen as there will be a dead lock.
- trans 1 will get shared lock on select of row which will be released only on end of transaction.
- trans 2 will get shared lock on select of row which will be released only on end of transaction.
- trans 1 will try to get X lock to update the row but cannot because of shared lock held by trans 2.
- trans 2 will try to get X lock to update the row but cannot because of shared lock held by trans 1.
- dead lock will be detected and one of the transactions will be terminated. the winning transaction will be successful in updating.
- Hence in 2PL the lost / buried update problem will not happen in READ REPETABLE and SERIALIZABLE .
- If two transactions read the same data then it can lead to lost or buried update problem.
- Transaction 1 reads data and transaction 2 modifies the data read by trans 1.
- Repetable read related. (trans1 : read data, trans2: update data , trans 1: read data)
- Non repetable/fuzzy reads (the same queries on being fired 2 times in the same transaction will return different values.
- This problem is solved at isolation levels READ REPEATABLE and SERIALIZABLE as shared lock is released at the end of transaction, not after the read is over.
- trans 1 gets (S) lock and retrives rows via query Q..
- trans 2 will try to get (X) lock but will have to wait as (S) lock is not released after read .(it is released at the end of trans)
- trans 1 will fire the the query Q again and fetch the same rows.
- Note that Some ORM frameworks (e.g. JPA/Hibernate) offer application-level repeatable reads. The first snapshot of any retrieved entity is cached in the currently running Persistence Context. Any successive query returning the same database row is going to use the very same object that was previously cached. This way, the fuzzy reads may be prevented even in Read Committed isolation level.
- This problem is solved at isolation levels READ REPEATABLE and SERIALIZABLE as shared lock is released at the end of transaction, not after the read is over.
- Phantom reads : The SQL standard says that Phantom Read occurs if two consecutive query executions render different results because a concurrent transaction has modified the range of records in between the two calls. Phantom read is similar to non repeatable read however , in phantom read the number of rows returned is not the same.In other words if a transaction makes a business decision based on a set of rows satisfying a given predicate, without range/predicate locks, a concurrent transaction might insert a record matching that particular predicate.
- trans 1 fires query select * from employee where salary > 100 and gets 10 records
- trans 2 inserts an imployee in employee table with salary > 100.
- trans 1 again fires query select * from employee where salary > 100
- trans fires the same query again and will get 11 rows instead of 10 rows..
- Phantom reads are blocked by SERIALIZABLE isolation level as the serializable isolation level forces queries with ranged predicates to acquire range(ms-sql)/gap(mysql) /predicate locks to avoid phantom reads.
- For instance innodb locks all rows found with an exclusive lock and gaps between them with a shared gap lock.
- Note that gap locks can block inserts in a table hence should be used with caution.
- Note the following stack over flow questions about doubts on gap lock
- Write skew
- The first transaction can acquire shared locks on both entries, therefore preventing the second transaction from updating these records.
- Non repetable/fuzzy reads (the same queries on being fired 2 times in the same transaction will return different values.
Handling isolation anamolies via MVCC
- Dirty read
- Non repetable read
- Phantom read
- As per SQL standard Phantom reads are blocked by SERIALIZABLE isolation level.Some databases like InnoDB uses MVCC in REPETABLE READ isolation level.
- A transaction using REPEATABLE READ in InnoDB will perform reads as if they were run at the same point in time (“snapshotted”) as the very first read in the transaction. This allows a consistent view of the database across queries without running into “phantom rows”: records that appear or disappear on subsequent reads.Similary dirty reads and non repetable reads are also blocked. However note that although there is no phantom reads unlike 2PL the rows actually are inserted ie range of records is modified ie the outcome is not always same as 2PL implementation.
- As per SQL standard Phantom reads are blocked by SERIALIZABLE isolation level.Some databases like InnoDB uses MVCC in REPETABLE READ isolation level.
- Lost update problem.
- This depends on implementation.In MySql MVCC is used in the isolation levels Read Committed and Repeatable Reads . The transaction schduler needs to track modification of read records but InnoDB does not do that. Hence for InnoDB , either Serializable isolation mode of select for update has to be used in other modes. How ever note that most implementations of MVCC where first updater wins will block Lost update problem.(Read more here : https://clarifyforme.com/posts/5750334462885888/Snapshot-isolation)
Critical points to remember
- Serializable is the highest isolation level but even this in oracle does not block all isolation anamolies (for instance in oracle serializable does not block read skew).
- Serializable isolation level is also the worst from performance stand point.
- The transaction isolation is selectable per transaction and this should be used to select an isolation level for the transaction which is best for performance yet does not cause isolation anomalies.
- When you choose a transaction isolation level you get a well defined isolation behaviour (baed on 2PC locking or mvcc), how ever explict locking strategy can be used when you want to work at lower isolation levels yet want to avoid isolation anamolies which coud arise in the transaction. Explicit locking will override isolation level and whether mvcc is being used. Infact by choosing an isolation level you are only choosing a locking strategy.
FAQ
Question : In mysql is SELECT FOR UPDATE required in isolation level REPETABLE READ to solve the lost update problem?
Yes. The SQL standard says nothing about lost updates because the standard was designed when 2PL (Two-Phase Locking) was the Concurrency Control mechanism. If you use 2PL, then the Repeatable Read isolation level will, indeed, prevent a Lost Update.( (As explained in this article, if two transactions have read data then reads acquire shared lock which is released only on transaction completion , hence if either of the transaction tries to write then it will have acquired exclusive lock which will fail)However, MVCC can provide Repeatable Reads via multiple versions of a tuple, but, in order to prevent Lost Updates, they also need the transaction scheduler to track tuple modifications for the records read by a certain transaction. Apparently, InnoDB does not work like that. Shouldn't MySQL MVCC prevent Lost Update using database-level Pessimistic Locking resulting in transaction rollback?Well, MVCC does not use any Pessimistic Locking in Repeatable Read. The only locks taken are the gap and next-key locks taken on the clustered index, but those don't prevent Lost Updates. MySQL uses Pessimistic Locking for Serializable only, which provides a 2PL Concurrency Control model, even when using the MVCC-based InnoDB storage engine. Hence Serializable isolation level will solve the lost update problem.
In a nutshell lost update anamoly is not part of standards like other anamolies like dirty read, non repteable read , phantom read. 2PL (eg in ms sql)addressess this issue when read locks are acquired for reads (and held till end of transaction). MVCC without locks theoretically should resolve lost update but does not do so for example in MY SQL- Inno Db (2PC is only used for serializable).
Read more here https://forums.mysql.com/read.php?22,56420,57733.
Question : What is a consistent read?How does it work in InnoDB?
A read operation that uses snapshot information to present query results based on a point in time, regardless of changes performed by other transactions running at the same time. With REPEATABLE READ isolation level, the snapshot is based on the time when the first read operation is performed. With READ COMMITTED isolation level, the snapshot is reset to the time of each consistent read operation.Consistent read is the default mode in which InnoDB processes SELECT statements in READ COMMITTED and REPEATABLE READ isolation levels. Because a consistent read does not set any locks on the tables it accesses, other sessions are free to modify those tables while a consistent read is being performed on the table.
Question : Does snapshot isolation block phantom read anamoly?
The 2PL-based Serializable isolation prevents Phantom Reads through the use of predicate locking while MVCC (Multi-Version Concurrency Control) database engines address the Phantom Read anomaly by returning consistent snapshots.
Question : Does snapshot isolation block read skew anamoly?
-
Question : In inno db gap locks are only used in REPETABLE READ and SERIALIZABLE isolation levels.REPETABLE READ uses MVCC in innodb . this implies that innodb may be using some form of first updater wins, not first commiter wins. is this correct?
Question : Do inserts create table lock?
No, read more here https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html