Unit 4.4 - Organisation of Data

Cards (15)

  • Batch processing is where a computer periodically completes high-volume, repetitive tasks
  • Transactional files hold data from day to day interactions e.g. a till at a supermarket
  • Transactional Files
    • Temporary
    • Serial
    • At the end of a set time period, the data on the transactional file is copied onto the master file and the file is wiped
  • Master Files hold data collected over a long time period e.g. historical company data
  • Master Files
    • Sequentially ordered by key field
    • Used to perform batch processing
  • Batch Processing
    • A sorted transactional file is used to update a master file
    • A report , an updated master file and an error file is produced
  • Batch processing requires a large amount of processing power so must be performed at an off peak time
  • Batch processing is useful as it ignores errors and stores them to be handled later
  • Batch processing can take a large amount of time to complete, so a system might be unusable while undergoing this process
  • Serial Files

    Where there is no particular order to the data added to a record and new records are added to the back of a file
  • Sequential Files

    Where records are organised by primary key. Requires a temporary file to perform updates to the file
  • Direct / Random Access Files

    Where records can be accessed at any time by jumping to their location through a hashing algorithm, rather than performing a search for the data
  • Direct / Random Access Files
    • Split into fixed length blocks of data
    • Data added is assigned an index by a hashing algorithm
    • Data is retrieved by hashing the query to generate the location to be looked in
    • Too many blocks will cause space to be wasted
    • Too few blocks will cause collisions
  • Hashing Algorithm Requirements
    • Deterministic: Always produces the same result
    • Uniformity: Data should be evenly spread
    • Data Normalisation: Data fed to a hashing algorithm should be normalised
    • Continuity: Keys that differ by small amounts should have hash values that differ by small amounts
    • Non-Invertible: Hash values should not be reversible to obtain the original data
  • Block Overflow can be resolved by:
    • using an overflow area (separate chaining)
    • creating a new file
    • Using a new hashing algorithm