A transaction is an atomic unit of work that is either completed in its entirety or not done at all
Transaction processing system: a system with large databases and hundreds of concurrent users executing database transactions
Examples of transaction processing systems include airline reservations, banking, credit card processing, online retail purchasing, stock markets, and supermarket checkouts
A transaction represents a logical unit of database processing that must be completed entirely for correctness
A transaction is typically implemented by a computer program and includes database commands such as retrievals, insertions, deletions, and updates
Concurrency can be achieved through interleaved processing (concurrent execution of processes in a single CPU) or parallel processing (concurrent execution in multiple CPUs)
Most concurrency control theory in databases is developed in terms of interleaved concurrency
A transaction is a set of logically related operations, containing a group of tasks performed by a single-user or multi-user to access database contents
Main operations of a transaction include Read(X), Write(X), and other operations like debit transactions
Commit operation is used to save work done permanently, while Rollback operation is used to undo work done
Transactions have four ACID properties: Atomicity, Consistency, Isolation, and Durability
Atomicity ensures all operations of a transaction take place at once or the transaction is aborted
Consistency maintains integrity constraints to keep the database consistent before and after the transaction
Isolation ensures data used during a transaction cannot be accessed by another transaction until the first one is completed
Durability indicates that changes made by a transaction are permanent and cannot be lost due to system failures
Transactions can be in states like Active, Partially-Committed, Committed, Failed, or Aborted
A schedule is a series of operations from one transaction to another, with types including Serial, Non-Serial, Serializable, Complete, and Recoverable schedules
Serial schedules execute one transaction completely before starting another, while non-serial schedules allow interleaving of operations
Serializable schedules allow transactions to execute concurrently without interfering with each other, identifying correct schedules with interleaving operations
Complete schedules end with either abort or commit operations, while recoverable schedules ensure each pair of transactions is recoverable
Complete schedule: If the last operations of each schedule are either abort or commit, then it is called a complete schedule
Recoverable Schedule: For each pair of transactions (Ti, Tj) such that Tj reads a data item previously written by Ti, the commit operation of Ti should appear before the commit operation of Tj
Cascadeless Schedule: For each pair of transactions (Ti, Tj) such that Tj reads a data item written by Ti, the commit operation of Ti should appear before the read operation of Tj
Strict Schedule: A value written by a transaction cannot be read or overwritten by another transaction until the transaction is either aborted or committed
Serialization Graph is used to test the Serializability of a schedule
Precedence graph:
Contains a pair G = (V, E), where V consists of a set of vertices representing transactions and E consists of a set of edges representing certain conditions
Conditions for creating edges Ti->Tj:
1. Create a node Ti->Tj if Ti executes write (Q) before Tj executes read (Q)
2. Create a node Ti->Tj if Ti executes read (Q) before Tj executes write (Q)
3. Create a node Ti->Tj if Ti executes write (Q) before Tj executes write (Q)
Testing of Serializability:
If a precedence graph contains a single edge Ti->Tj, then all instructions of Ti are executed before the first instruction of Tj
If a precedence graph contains a cycle, then the schedule is non-serializable; if no cycle, then the schedule is serializable
Conflict Serializable Schedule:
A schedule is conflict serializable if, after swapping non-conflicting operations, it can transform into a serial schedule
Conflicting operations:
Belong to separate transactions
Involve the same data item
Contain at least one write operation
View Serializable Schedule:
A schedule is view serializable if it is view equivalent to a serial schedule
View equivalent conditions:
1. Initial Read: Initial read operations in both schedules must be the same
2. Updated Read: If Ti reads A updated by Tj in S1, then Ti should read A updated by Tj in S2
3. Final Write: Final write operations must be the same in both schedules
Recoverability of Schedule:
Irrecoverable Schedule: Tj reads the updated value of Ti and Tj commits before Ti commits
Recoverable with Cascading Rollback: Tj reads the updated value of Ti, and commit of Tj is delayed till commit of Ti
Cascade Less Recoverable Schedule: T1 reads and writes A, commits, and T2 reads and writes A
Transaction Support in SQL:
Characteristics specified by SET TRANSACTION statement: access mode, diagnostic area size, isolation level
Access mode can be READ ONLY or READ WRITE; default is READ WRITE
Diagnostic area size specifies the number of conditions that can be held simultaneously
SERIALIZABLE term ensures no violations causing dirty read and unrepeatable read
Violations at lower isolation levels than SERIALIZABLE:
1. Dirty read: T1 reads the update of T2 before T2 commits
2. Non repeatable read: T1 reads a value from a table, T2 updates that value, and T1 reads a different value
Recovery means restoring the database to the most recent consistent state just before the time of failure
System must keep information about changes applied to data items in the System Log
Catastrophic Failure (Physical Damage) recovery method is backup that restores a past copy of the database to storage and reconstructs a more current state by reapplying or redoing the operations of committed transactions from the backed up log up to the time of failure
Non-catastrophic failure recovery strategy is to identify changes that may cause an inconsistency in the database
For non-catastrophic failure, recovery protocol does not need a complete copy of the database, entries in the online system log on disk are analyzed for appropriate actions
Two main techniques for recovery from non-catastrophic transaction failures: Deferred Update and Immediate Update
Deferred Update:
Updates are recorded in the log before reaching commit
After commit, updates are written to the database on disk
If a transaction fails before commit, UNDO is not needed
Immediate Update:
Database may be updated by some operations of a transaction before reaching commit
Operations must be recorded in the log on disk before being applied to the database
If a transaction fails after recording changes but before commit, UNDO is required