DB

Cards (64)

  • A transaction is an atomic unit of work that is either completed in its entirety or not done at all
  • Transaction processing system: a system with large databases and hundreds of concurrent users executing database transactions
  • Examples of transaction processing systems include airline reservations, banking, credit card processing, online retail purchasing, stock markets, and supermarket checkouts
  • A transaction represents a logical unit of database processing that must be completed entirely for correctness
  • A transaction is typically implemented by a computer program and includes database commands such as retrievals, insertions, deletions, and updates
  • Concurrency can be achieved through interleaved processing (concurrent execution of processes in a single CPU) or parallel processing (concurrent execution in multiple CPUs)
  • Most concurrency control theory in databases is developed in terms of interleaved concurrency
  • A transaction is a set of logically related operations, containing a group of tasks performed by a single-user or multi-user to access database contents
  • Main operations of a transaction include Read(X), Write(X), and other operations like debit transactions
  • Commit operation is used to save work done permanently, while Rollback operation is used to undo work done
  • Transactions have four ACID properties: Atomicity, Consistency, Isolation, and Durability
  • Atomicity ensures all operations of a transaction take place at once or the transaction is aborted
  • Consistency maintains integrity constraints to keep the database consistent before and after the transaction
  • Isolation ensures data used during a transaction cannot be accessed by another transaction until the first one is completed
  • Durability indicates that changes made by a transaction are permanent and cannot be lost due to system failures
  • Transactions can be in states like Active, Partially-Committed, Committed, Failed, or Aborted
  • A schedule is a series of operations from one transaction to another, with types including Serial, Non-Serial, Serializable, Complete, and Recoverable schedules
  • Serial schedules execute one transaction completely before starting another, while non-serial schedules allow interleaving of operations
  • Serializable schedules allow transactions to execute concurrently without interfering with each other, identifying correct schedules with interleaving operations
  • Complete schedules end with either abort or commit operations, while recoverable schedules ensure each pair of transactions is recoverable
  • Complete schedule: If the last operations of each schedule are either abort or commit, then it is called a complete schedule
  • Recoverable Schedule: For each pair of transactions (Ti, Tj) such that Tj reads a data item previously written by Ti, the commit operation of Ti should appear before the commit operation of Tj
  • Cascadeless Schedule: For each pair of transactions (Ti, Tj) such that Tj reads a data item written by Ti, the commit operation of Ti should appear before the read operation of Tj
  • Strict Schedule: A value written by a transaction cannot be read or overwritten by another transaction until the transaction is either aborted or committed
  • Serialization Graph is used to test the Serializability of a schedule
  • Precedence graph:
    • Contains a pair G = (V, E), where V consists of a set of vertices representing transactions and E consists of a set of edges representing certain conditions
    • Conditions for creating edges Ti->Tj:
    1. Create a node Ti->Tj if Ti executes write (Q) before Tj executes read (Q)
    2. Create a node Ti->Tj if Ti executes read (Q) before Tj executes write (Q)
    3. Create a node Ti->Tj if Ti executes write (Q) before Tj executes write (Q)
  • Testing of Serializability:
    • If a precedence graph contains a single edge Ti->Tj, then all instructions of Ti are executed before the first instruction of Tj
    • If a precedence graph contains a cycle, then the schedule is non-serializable; if no cycle, then the schedule is serializable
  • Conflict Serializable Schedule:
    • A schedule is conflict serializable if, after swapping non-conflicting operations, it can transform into a serial schedule
    • Conflicting operations:
    • Belong to separate transactions
    • Involve the same data item
    • Contain at least one write operation
  • View Serializable Schedule:
    • A schedule is view serializable if it is view equivalent to a serial schedule
    • View equivalent conditions:
    1. Initial Read: Initial read operations in both schedules must be the same
    2. Updated Read: If Ti reads A updated by Tj in S1, then Ti should read A updated by Tj in S2
    3. Final Write: Final write operations must be the same in both schedules
  • Recoverability of Schedule:
    • Irrecoverable Schedule: Tj reads the updated value of Ti and Tj commits before Ti commits
    • Recoverable with Cascading Rollback: Tj reads the updated value of Ti, and commit of Tj is delayed till commit of Ti
    • Cascade Less Recoverable Schedule: T1 reads and writes A, commits, and T2 reads and writes A
  • Transaction Support in SQL:
    • Characteristics specified by SET TRANSACTION statement: access mode, diagnostic area size, isolation level
    • Access mode can be READ ONLY or READ WRITE; default is READ WRITE
    • Diagnostic area size specifies the number of conditions that can be held simultaneously
    • Isolation Level options: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE
  • Transaction Support in SQL (continued):
    • SERIALIZABLE term ensures no violations causing dirty read and unrepeatable read
    • Violations at lower isolation levels than SERIALIZABLE:
    1. Dirty read: T1 reads the update of T2 before T2 commits
    2. Non repeatable read: T1 reads a value from a table, T2 updates that value, and T1 reads a different value
  • Recovery means restoring the database to the most recent consistent state just before the time of failure
  • System must keep information about changes applied to data items in the System Log
  • Catastrophic Failure (Physical Damage) recovery method is backup that restores a past copy of the database to storage and reconstructs a more current state by reapplying or redoing the operations of committed transactions from the backed up log up to the time of failure
  • Non-catastrophic failure recovery strategy is to identify changes that may cause an inconsistency in the database
  • For non-catastrophic failure, recovery protocol does not need a complete copy of the database, entries in the online system log on disk are analyzed for appropriate actions
  • Two main techniques for recovery from non-catastrophic transaction failures: Deferred Update and Immediate Update
  • Deferred Update:
    • Updates are recorded in the log before reaching commit
    • After commit, updates are written to the database on disk
    • If a transaction fails before commit, UNDO is not needed
  • Immediate Update:
    • Database may be updated by some operations of a transaction before reaching commit
    • Operations must be recorded in the log on disk before being applied to the database
    • If a transaction fails after recording changes but before commit, UNDO is required
    • Both UNDO and REDO may be needed during recovery
    • Known as the UNDO/REDO Algorithm