The minimum and maximum values for a given number of bits are 0 and 2^n - 1, for 2^n total values
We can multiply binary numbers by breaking the multiplier down into numbers that are 2^n, performing multiple left shifts, and adding the results together
When adding positive and negative binary numbers together, the final carry does not matter if it would normally cause an overflow - it can be disregarded
Negative binary numbers can be represented using two's complement.
A signed binary number uses the leftmost bit to determine if it is positive (0) or negative (1)
We can perform binary subtraction by converting the number to be subtracted into being negative, and then performing addition
To convert a positive binary number to a negative, start from the right and copy everything up to and including the first 1. Then, flip every remaining bit.
We can use a floating point to represent fractions, where the decimals continue to decrease by half
A floating point consists of the mantissa - the actual number, and the exponent - that determines where the floating point is
A positive exponent means the final number will be bigger than the mantissa. A negative exponent means it will be smaller.
We can backfill a number with 0s if it is positive or 1s if it is negative, because these are not significant
An exponent tells us how many places to shift the mantissa
Some numbers cannot be represented exactly in binary, only approximately represented, because continuing to half the size will never reach the accuracy some things need, eg 0.1
The absolute error is the difference between the actual value and the expected value. The relative error is a percentage, which is the absolute error divided by the target value.
Relative errors scale with magnitude, but absolute errors don't. An absolute error could be miniscule for one value but massive for another
With fixed point, there is a trade off between range and precision depending on where the binary point is. Processing speeds are faster than floating point
With floating point, the same number of bits can represent a large range or great precision depending on how they are distributed between the mantissa and exponent - just not at the same time. Processing speeds are slower than fixed point
A normalised floating point number must always have the mantissa start with 01 if positive or 10 if negative
To normalise a floating point number, move the point so that there is only one digit to the left (the bit that determines if it is positive or negative) - everything else should be 'decimal'. Then include the relevant exponent
An underflow error occurs if a number is too small to be represented in the provided format. Tends to result in a stored value of 0
An overflow error occurs if a number is too large to be represented in the provided format, such as the result of calculations.