Binary number system

Cards (29)

  • Whole numbers such as 7, 12, and 3988 are called integers. Unsigned integers have positive values by definition, while signed integers can be positive or negative.
  • You have learned that n bits can represent 2^n different binary values. Knowing this, you can calculate the largest and smallest values that can be held in a binary sequence.
    If the sequence represents an unsigned integer, the minimum value that can be represented with n bits is 0, for which all the n bits are set to 0.
    The maximum value occurs when all the n bits are set to 1 and it is calculated as 2^n - 1 .
  • In computer systems, numbers are represented with a set number of bits. If two large numbers are added together and the addition of the most significant bits produces a carry bit, then the result will exceed the available number of bits. This is called an overflow error. With 8-bit numbers, overflow errors occur when the result is larger than 11111111.
  • You are unlikely to encounter overflow errors in your day-to-day programming, because computers generally store numbers using a very large number of bits. Some programming languages, such as Python, do not limit the amount of storage allocated for integers, although you may encounter overflow errors when performing calculations with very large real numbers.
  • To convert from decimal to binary, divide the decimal number by 2 repeatedly until the quotient becomes zero. Record the remainders in reverse order to get the binary equivalent.
  • When working with signed integers, there are additional issues to consider. The most significant bit (MSB) is used to indicate whether the number is negative or positive. When the MSB is set to 1, the number is negative; otherwise, it is positive.
  • Binary numbers can also be converted into hexadecimal notation. To do so, split the binary string into groups of four bits at a time, starting from the right side. Convert each group of four bits into its corresponding hexadecimal digit. Finally, join these digits together to form the final hexadecimal number.
  •  In binary, there are several ways to represent signed integers, the most common being two's complement.
  • In the two’s complement representation, the most significant bit has a place value (or weight) of -2^(n-1).
  • The results of calculations are not always whole numbers and so they can't be expressed accurately as integers. Hence, computer systems need to be able to represent numbers with a fractional part.
    Fractional numbers in binary can be represented in fixed point form or floating point form.
  • The place values of fractional numbers in binary are doubled every time we move one digit to the left of the binary point, and are halved every time we move one digit to the right of the binary point.
  • To represent a denary number in fixed point form or floating point form, the whole part of the number needs to be broken down into a sum of powers of 2. The fractional part needs to be broken down into a sum of fractions with denominators of powers of 2.
    The whole part of a number can always be represented in this way if there are enough binary digits available. However, for the fractional part of some numbers, that is not always possible.
    For example, 3/5 cannot be represented accurately as the denominator is not a power of 2.
  • Fixed point arithmetic uses a fixed position for the decimal point (binary point), which means it does not change during an operation. This makes it easy to add or subtract numbers but multiplication and division become more complex because the positions of the decimal points must be adjusted.
  • Floating point arithmetic allows the decimal point to float around within a range of numbers. It simplifies multiplication and division by keeping the decimal point in the same place throughout the calculation. Addition and subtraction require adjusting the decimal point back to its original location.
  • Computers use different formats for storing floating point numbers depending on their size and precision requirements.
  • Binary floating point notation represents real numbers using a mantissa and exponent. The mantissa contains the fractional part of the number while the exponent determines where the decimal point goes.
  • When converting from denary to binary, the first step is to divide the number by two repeatedly until the quotient becomes zero. Each remainder obtained during these divisions represents a bit in the binary equivalent of the number.
  • Truncation deals with the problem of having insufficient bits to store a number by discarding the bits that cannot be stored. The bits discarded will be the least significant bits.
  • When you round a binary number, you look at the first bit that you can't store (the bit that is to be discarded).
    • If the first bit to discard is 1, you add 1 to the last bit (the least significant bit) of the number that you can store.
    • If the first bit to discard is 0, you just discard the extra bits (which has the same effect as truncation).
  • Binary numbers that cannot be represented accurately given a specific number of bits, are represented using an approximate value either by truncating or by rounding.
  • The difference between the approximate value of the binary representation of a number (i.e. the number that you are able to store) and the original value of the number is called absolute error. When calculating absolute errors, the result is always a positive number.
  • The relative error is the absolute error divided by the original value of the number (x100 for a percentage). The relative error is often represented as a decimal number or percentage.
  • In general, when we use fewer bits to represent a number, there is more chance of getting larger relative errors.
  • Rounding is used to reduce the size of the relative error compared to truncation.
  • The absolute error is not always a clear indication of how significant the error in the representation of a number is. This is why it is so useful to calculate the relative error.
    As a general rule, for a given absolute error, the bigger the original number, the smaller the ‘impact’ of the error.
  • The mantissa of a floating point number holds the detail of the value of a number, while the exponent is used as a positive or negative factor that can increase or decrease the magnitude of the number. Hence, increasing the number of bits used for the mantissa results in more precision, while increasing the number of bits used for the exponent allows a wider range of available numbers.
  • Normalised floating point form can maximise the available precision of fractional numbers for a given number of bits, while fixed point form only allows for a set number of bits after the binary point, which can limit the available precision.
  • In general, calculations that involve floating point numbers take longer than calculations involving fixed point numbers. This is because using floating point numbers require a more complex algorithm that is able to shift the binary point as part of evaluating a number. However, the limited representation range of fixed point form outweighs its advantage in speed.
  • Underflow occurs when the result of a calculation is too small, i.e. it is too close to zero, to be represented with the available number of bits.
    For example, underflow can occur due to subtraction of two numbers with similar values and the same sign; when subtracting two positive numbers that are close in value, the result may a number very near to zero.