data is constantly moved around the systems of a computer and across different networks
this transfer is usually accurate and done at very high - speeds.
as the distance the data travels gets longer the transfer becomes slower and can be more susceptible to interference
storage space on devices is also limited
why do we compress data?
text, images and sound files can all be reduced in size significantly
this works to ensure data is:
-> sent more quickly
-> sent using less bandwidth (as transfer limits may apply)
-> less likely to buffer, especially on video streams
-> taking up less storage space
Lossy compression
data that is non-essential is removed permanently from the file. When received it is reconstructed without this data.
eg, different shades of a colour in an image, frequencies outside human hearing range, or quieter notes played at the same time as louder sounds are removed.
quality decreases
file size will also decrease
lossless compression
Patterns in the data are summarised into a shorter format without permanently removing any information - a new file can be replicated exactly without any loss of data
eg, 1110000111 -> 314031
The reduction in file size is less than for lossy compression but no quality is lost.
It is used for text files and software as they need to be entirely accurate when received.
RLE
run length encoding.
A basic method of compression that summarises consecutive patterns of the same data
Works well with image and sound data where data is often repeated many times.
eg A sound recording (typically 44,000Hz) could have the same note played for a fraction of a second resulting in hundreds of identical samples one example of the sample and how many times it consecutively repeats is recorded.
dictionary compression
regularly occurring data is stored separately in a 'dictionary '
the reference to the entry in the dictionary is stored in the main file thereby reducing the original data stored
This produces additional overhead code and file space but the space saved negates this problem (if patterns are repeated frequently enough)
->reduces file size on text where ASCII would be used (8bits) compared to the potential 2 bits used for the reference
encryption of data
A way of making sure data cannot be understood by those don’t possess the means to decrypt it (a cipher)
Plaintext of a message sent is encrypted using a cipher algorithm and key into equivalent ciphertext
When received, the data is decrypted using the same or a different key
Two methods at the opposite end of the security spectrum are the Caesar cipher and the Vernam cipher
Caesar cipher
The most basic type of encryption and the most insecure
Letters of the alphabet are shifted by a consistent amount
eg ABCDEFGHIJKLMNOPQRSTUVWXYZ
EFGHIJKLMNOPQRSTUVWXYZABCD
(A -> E and so on using a shift of 4)
Spaces are often removed to mask word lengths
vernam cipher
the encryption key, also known as the one-time pad
The key must be:
->a truly random sequence greater or equal in length than the plaintext and only ever used once
->truly random, generated from a physical and unpredictable phenomen eg atmospheric noise or radioactive decay)
->Shared with the recipient by hand, independently of the message and destroyed immediately after use
decoding a vernam cipher
Encryption and decryption of the message is performed bit by bit using an exclusive or (XOR) operation with the shared key
eg 01001100 <- code for encryption
01100011 <- original message
00101111 <- result of XOR
(it is reversed to decode the message, which is only possible if you have the one time pad)
decoding ciphers
A brute force attack : attempts to apply every possible key to decrypt ciphertext until one works
Spaces are often removed to mask word lengths
frequency analysis: looks at repeated letters in code and equates them to commonly used letters in words normally.
Given enough ciphertext, computer power and time, any key (except the one-time pad) can be determined and the message cracked
The encryption key, also known as the one-time pad, is the only cipher proven to be unbreakable