ENGGEN 101G

Created by

Venus Chandra

Cards (97)

A single bit can be interpreted as:
The outcome of an event (“success” or “failure”)
The response to a yes-no question
Presence or absence of some feature
All our data is stored using switches with two states - on or off. Can store binary data by controlling the state of the switch
Transistor - electronic component that behaves like a controllable switch. Tech behind solid state drives
ASCII is used to represent data. ASCII: American Standard Code for Information exchange. Was developed in the 1960s to standardise how data is represented in binary. Each char is represented by 1 byte
Unicode provides additional character encodings (characters from other languages). UCS-2 uses two bytes per character. UCS-4 uses four bytes per character - was not implemented well as it took up to much space
UTF-8 - variable length encoding. How does it work?
To store an image, e.g. black and white image black pixel will be encoded as 1 and white as 0
Data Communications
Goal: send data (message) from a source to destination. Message will travel from source -> transmitter -> channel -> receiver to destination
Types of channels: 1.Wired - USB, HDMI, ethernet fibre optics. Wireless - satellites, cell phones, WiFi, Bluetooth
Both the transmitter and receiver must use the same protocol (method of communicating)
Three main parts in data communication
Transmitter - takes the raw data (bits) and turns these into signals suitable for the channel/ E.g. voltages/currents on a wire for USB and ethernet. Radio waves for WiFi and cell phones
Channel - alters and distorts the transmitted signal
Receiver - takes an incoming signal and works out what the signal means based on the noisy and distorted received data/signal
Errors could occur
Series of 0s and 1s will represent the ASCII character ‘H’ - 01001000
Impact Of Distortion & Noise
Longer cables attenuate and distort the data signals so signals at the end of the cable do not equal to the input signal. Longer cables are typically thicker to reduce distortion of data signals (and optic fibre cables as light signals distort less)
Noise is random variations that are superimposed on the signal. If the noise is too big, there is a blurring between when there is 0 and a 1 in the output signal (i.e make errors)
Communication Protocols
How to break up a long message so the communication channel can understand in by using smaller packets of data and how to order the packets to show the message - sequencing
Routing - path from source to destination
Formatting - extra data for info on sequencing and routing
Flow control - ensures resource sharing and prevent congestion (e.g. multiple laptops connected to access point)
Error control - detecting errors and fixing them if possible
Connection establishment - how to establish and end a connection between two devices
Bit error = occurs when noise/distortion alters the transmitted bit in the channel
Bit Error Rate (BER) = number of bit errors / total number of bits -
Ideally keep the rate low - a useful measure of how reliable our communication system is and no channel has 0 bit error rate
BER of 0 .5 means that half of the time the bits are being flipped. Try to get BER under 0.1
If an error is detected in the packet - discard and retransmit the data. This is the packet error rate/ Number of packets that have to be resent (ideally near to 0). Useful to segment data into packets so we don't have to re transmit everything when an error occurs
If there is a transmission error if two bit errors occur, the parity bit will be correct and cannot detect the error in this case.
Parity bits - every 7 bits of data…
If there is an even number of 1s, set the 8th bit to 0
If there is an odd number of 1s, set the 8th bit to 1
If the number of 1s is different with the original parity bits, it tells us an error has occurred
Repetition Codes (SEARCH UP)
Repeat the same bit multiple times. Can detect if an error occurs; checks if all bits are not the same.
Hamming Codes
Multiple parity bits for different sub parts of our sequence.
Take 16 bits of data and divide it into a 4 x 4 table
Two main types of compression
Lossy - reduce size by discarding unneeded info (e.g.MP3 audio, jpeg images)
Lossless -(nothing is lost when compressed) reduce size by looking for redundancy in data (e.g. zip files, lossless audio formats, pdf)
Dictionary-based compression
Each word is written as x / y, where x gives the page number and y the number of the word on that page
Dictionary-based compression challenges
No advantages for short words or phrases (multiple words)
Adaptive Scheme
Adaptive schemes build the dictionary from scratch. Basic idea is that it looks for repeating “words” and so every time a word is encountered add it to the dictionary but for every subsequent time, reference the dictionary. Send both dictionary and data
Adaptive Scheme
The adaptive scheme will not work if its compressed data and dictionary is larger than the original size
The adaptive scheme will go wrong if there were keys and no words. Needs a certain amount of data. How to determine the “word” size? - It depends on the data
Word Size
Size of “words” in dictionary
It will be the longest length of repeating data
Dictionary Size
Amount of data that is loaded into memory
Used to detect common “words”
Solid block size
Minimum amount of data per part
One dictionary per part
An image is stored as a set of RGB pixels
Lossy Compressions Advantages
Increases compression (better than lossless)
Fast to implement
Most of the time, data is not needed (like colours in an image to a certain point we cannot detect)
Lossy Compressions Disadvantages
Data is lost and cannot be restored (e.g. rescaling and recolouring)
Repeated lossy compression loses more data along the way
AI - the creation of computers capable of carrying out activities that normally involve human intelligence
AI modelling - involves designing algorithms and models that can learn from data and make predictions or decisions
Machine Learning
Machine learning extract rules from data without programming, the algorithm (can be an algorithm, equation or neural network) will learn features/patterns. It learns the rules itself to classify things
Give many examples to the algorithm which will work out that when it sees similar images, it fits under a certain classification
Machine Learning
You need three components in machine learning - input data points, examples of the expected output, a suitable metric to evaluate the algorithm’s performance.
If the data is not sufficient or are missing values, the machine learning will not learn properly and classify well (more difficult) e.g. incorrect labels, missing values, irrelevant data, limited number of examples, blurred or unclear images
AI (ANI VS AGI)
ANI - ANI, often referred to as weak AI, is designed for a specific task or group of tasks without general intelligence or consciousness.
AGI - brain of the model, possess human level intelligence e.g. model used for speech, recognition, self driving car, human biometrics
ANI has a training and testing phase. Test the machine learning on something that wasn’t seen on the training phase (in the training dataset) (this is called generalise If it does not correctly classify the unseen data, you may have to get a better metric system, get a larger dataset e.g.
Supervised Learning
Supervised - know the input and what output is. To distinguish an apple, you need to find images that are not apples but not something like cars as it is too different. Has a teacher which has labels that the model needs to classify things under
Regression - find the best fit of line/curve that could represent the data well. Output is continuous
Unsupervised Learning
Unsupervised - let the system/model try to find the similarity of the data and group them together under classifications. Classification outputs are discrete numbers. Also known as clustering. Has two key challenges which is determining the optimal number of clusters and avoiding inappropriate representations
Reinforcement - it's also supervised, don't know the class and output but there is a teacher (agent - robot or computer). Agent makes a choice in a situation and when they get it wrong they get penalised and if they do get it right they get a reward of which the agent would later tell the model what is right and wrong
Results
Underfit - does not generalise well and does not find the best fit of the data points
Overfit - Parameter will pass through all points and will not generalise it, you may need more data points. Can be achieved to fit properly by reducing complexity or increasing data points.
Both underfit and overfit will not generalise unseen data very well
Reinforcement learning - Teacher guided (it knows the input and output or knows how to direct)/ . Instead, the learning system, referred to as an agent, interacts with its environment and receives feedback in the form of rewards or penalties for its actions. These rewards and penaltiesare determined and communicated by a supervisor or teacher, shaping the learning process.
ANN (Artificial Neural Networks) - The network acquires knowledge from its environment through a process of learning. The acquired knowledge is stored in the interneuron connection strengths (called synaptic weights) for info processing. Each input is multiplied by a weight. The activation function limits the range of the output signal to a finite output e.g. in between 0 and 1 based on a threshold
AI Neural Network
Shallow/Vanilla neural network - has few layers typically one hidden layer, an input and output layer
Deep neural network - has lots of hidden layers
Neural networks processes are parallel/have parallel architecture. Neural networks are good to learn from example and generalise their knowledge to unseen data
Architecture of the network
Layers, the numbers and types (input, output, hidden) determine complexity and capability of network
Neurons per layer, more neurons can capture complex patterns but may lead to overfitting
Types of networks - feedforward, recurrent etc
Activation functions
Sigmoid and ReLu determine how neuron inputs are transformed into outputs
Regularisation techniques - if the derivative is too high or too low it prevents overfitting and improve generalisation
AI Training Terms
Epoch - using 1 set of data for training, going through all the examples
Mini batch - using some of the examples for training and maybe adjusting the weights
To make machine learning faster you need CPU and GPU
Perceptron - a computer model computerised machine created to represent ability of the brain to recognise and discriminate
x _0 is always 1 as it is the bias. The optimizer will adjust the predictable value to be close to the true value (perhaps will adjust the weights). For each input vector x, the predicted output is computed using the activation output. If output matches target, there is no weight update. To stop number of iterations for perceptron training you either have a maximum number of iterations or set a target for the error and when it meets the error it stops
Encryption takes plaintext and converts it to ciphertext using a key
Two approaches to encryption:
Symmetric encryption - same key is used to encrypt and decrypt the ciphertext
Asymmetric encryption - different key is used for encryption and decryption
Theoretically all encryption schemes can be broken except for the one-time pad