Unit 4.4 - Organisation of Data

Created by

Luis

Cards (15)

Batch processing is where a computer periodically completes high-volume, repetitive tasks
Transactional files hold data from day to day interactions e.g. a till at a supermarket
Transactional Files
Temporary
Serial
At the end of a set time period, the data on the transactional file is copied onto the master file and the file is wiped
Master Files hold data collected over a long time period e.g. historical company data
Master Files
Sequentially ordered by key field
Used to perform batch processing
Batch Processing
A sorted transactional file is used to update a master file
A report , an updated master file and an error file is produced
Batch processing requires a large amount of processing power so must be performed at an off peak time
Batch processing is useful as it ignores errors and stores them to be handled later
Batch processing can take a large amount of time to complete, so a system might be unusable while undergoing this process
Serial Files 
Where there is no particular order to the data added to a record and new records are added to the back of a file
Sequential Files 
Where records are organised by primary key. Requires a temporary file to perform updates to the file
Direct / Random Access Files 
Where records can be accessed at any time by jumping to their location through a hashing algorithm, rather than performing a search for the data
Direct / Random Access Files
Split into fixed length blocks of data
Data added is assigned an index by a hashing algorithm
Data is retrieved by hashing the query to generate the location to be looked in
Too many blocks will cause space to be wasted
Too few blocks will cause collisions
Hashing Algorithm Requirements
Deterministic: Always produces the same result
Uniformity: Data should be evenly spread
Data Normalisation: Data fed to a hashing algorithm should be normalised
Continuity: Keys that differ by small amounts should have hash values that differ by small amounts
Non-Invertible: Hash values should not be reversible to obtain the original data
Block Overflow can be resolved by:
using an overflow area (separate chaining)
creating a new file
Using a new hashing algorithm