Information coding systems

Cards (10)

  • To represent text digitally, each character needs to have its own unique bit-pattern. Bit-patterns are combinations of 1s and 0s used to represent data inside of a computer. The bit-pattern used for each character becomes a numeric character code.
  • For computers to communicate and exchange text with each other efficiently, they must have an agreed standard that defines which character code is used for which character. A standardised collection of characters and the bit-patterns used to represent them is called a character set.
  • ASCII stands for 'American Standard Code for Information Interchange'. It was defined in 1963 and was one of the most common character sets used. It started by using 7 bits to represent characters, which allowed for a maximum of 128 characters to be represented.
  • These days, 8 bits (1 byte) are used to store each character in the ASCII character set. The original coding system remains, but each code now has a preceding 0. The eighth bit was sometimes used as a parity bit for checking for errors during the transmission of data.
  • When text is encoded and stored using ASCII, each of the characters is assigned a denary (decimal) character code, which is represented and stored in the computer as binary.
  • There are also non-standard extensions to ASCII, sometimes referred to as extended ASCII. These are schemes where the additional codes that arose from an 8-bit system were allocated to represent additional characters. However, such schemes varied from country to country so were not very useful for global communications. In modern coding schemes only the first 128 codes are retained allowing compatibility with the orginal ASCII coding scheme.
  • The problem with ASCII is that it only allows you to represent a small number of characters (128 for standard 7-bit ASCII). This might be enough to represent the characters in the English alphabet, but it is not sufficient to represent all of the languages and scripts in the world, and all of the possible numbers and symbols. 
  • The widespread use of the World Wide Web made it more important to have a universal international coding system, as the range of platforms and programs has increased dramatically, with more developers from around the world using a much wider range of characters.
  • Unicode is a widely adopted international character encoding standard designed to support the vast majority of written languages across the globe. Unicode uses two bytes per character, giving it a total capacity of over 65,000 different characters.
  • The first 128 codes in Unicode and ASCII are used to represent the same characters.