Text and binary

Cards (23)

  • Text files
    Sequence of characters that can be read by humans
  • Character encoding
    Usually Unicode, but other formats can be used
  • Opening a text file in a text editor

    1. Text editor software takes into account control characters
    2. Control characters are replaced with a different representation or acted on
    3. Text is then displayed
  • Control characters
    Special characters read and interpreted by the editor
  • Newline character
    • ASCII code 10, Unicode U+000A, often typed as \n
    • Backslash is an escape character, defines an escape sequence
  • When the text editor reads the newline character, it is removed and any text following it is placed on a new line
  • Understanding control characters is important as they can cause issues when working with text files
  • CSV (Comma-Separated Values)
    • Type of text file widely supported by many programs
    • Useful format for moving data between applications
  • CSV file
    • Uses a character set like ASCII or Unicode
    • Consists of records (typically one per line) divided into fields separated by delimiters
    • Has the same sequence of fields for every record
    • File name extension is generally .csv
  • CSV files may have field names encoded in the first line
  • When importing a CSV file, you are usually asked if there is a header line and what character has been used as the delimiter
  • Contents of shopping.csv
    • Item, Description, Qty
    • Rice, Organic, 1kg
    • Milk, Skimmed, 2.27l
    • Eggs, Free-range, 60
    • Sugar, Brown, 500g
  • All data on a computer is stored in binary sequences of 1s and 0s
  • Text files

    Binary codes represent characters, which can be displayed in text editors
  • Binary files
    • Hold information that is not character based, such as sound or image data
    • Encoded in a form that can be directly manipulated by a computer program
  • Most computer files are binary rather than text files, as humans rarely need to read the raw file data
  • File headers
    Tell the computer what kind of data it is looking at, especially important in program binary files
  • Program binary file header
    • Prefix 'MZ' - initials of the programmer who invented the format
  • Bitmapped graphic files
    • Images represented as a collection of pixels, each with an RGB colour code
    • Metadata also included in image files, such as dimensions and creation date
  • Sound files
    • Analogue signal converted to binary by sampling at fixed intervals
    • Data points stored as binary numbers, reversed to recreate the sound wave
  • Program files
    • Start as text files with programming language instructions
    • Compiled into machine executable binary format before being run
  • Software companies are often reluctant to document and share the exact format of their binary files due to concerns about cybercrime and reverse-engineering
  • The binary is processed by the program associated with it, and is translated back into the instructions and resources that the program needs