Programming language translators

Cards (17)

  • Assembly Code
    Computers execute machine code. It is difficult to read, write and debug machine code. Assembly code instructions are equivalent to machine code but easier for humans to work with.
  • Assembler
    Assembly code is a low level language. An assembler translates assembly code instructions into machine code. Each processer has its own instruction set and so the object code produced will be hardware specific.
  • Compiler
    Translates a whole program written in a high level language into executable machine code. The resulting machine code is called object code. The object code produced is hardware specific.
  • Interpreter
    Translates code written in a high level language into machine code. It does this line by line rather than translating the whole program before any of it can be executed.
  • Advantages of a compiler vs Interpreter:
    • Program can be run many times without the need to recompile
    • Faster to execute
    • Executable code does not require the interpreter to run
    • Compiled code cannot be easily read and copied by others
    Interpreter:
    • Source code can be run on any machine with the interpreter
    • If a small error is found, no need to recompile the entire program
  • Bytecode
    Java is compiled into bytecode which is an intermediate step between source code and machine code. Bytecode is interpreted by a bytecode interpreter e.g. Java virtual machine
  • Stages of Compilation
    A compiler goes through several stages to convert source code to object code:
    • Lexical analysis
    • Symbol table
    • Syntax analysis
    • Semantic analysis
    • Code generation
  • Lexical Analysis
    All unnecessary spaces and all comments are removed. Keywords, constants and identifiers are replaced with tokens representing their function in the program.
  • Lexical Analysis Example

    age = 17
    print(age)
    May produce the tokens:
    <identifier><operator><number><keyword><open_bracket><identifier><close_bracket>
  • Symbol table
    The lexer will build up a symbol table for every keyword and identifier in the program. The symbol table helps to keep track of the run-time memory address for each identifier.
  • Syntax Analysis
    The stream of tokens from the lexing stage is split up into phrases. Each phrase is parsed which means it is checked against the rules of the language. If the phrase is not valid, an error will be recorded. E.g. <number><operator><identifier> may not be valid and would be picked up by syntax analysis
  • Semantic Analysis
    It is possible to create a sequence of token which is valid syntax but is not a valid program. This is what semantic analysis checks for. E.g. <if><identifier><operator><number> is valid syntax however, the identifier has not previously been declared so it is semantically not a valid program
  • Code Generation
    Once the program has been checked, the compiler generates the machine code. It may do this in several "passes" over the code because code optimisation will also take place.
  • Code Optimisation
    Aims to:
    • reduce redundant instructions
    • replace inefficient code with code that achieves the same result but in a more efficient way
  • Libraries
    Sets of pre-written functions. A programmer can write their own libraries. A library function can be called within a program.
  • Linker
    The linker needs to put the appropriate memory addresses in place so that the program can call and return from a library function.
  • Loader
    The job of the loader is to copy the program and any linker subroutines into main memory to run. When the executable code was created it may assume the program will load in memory address 0. However, memory addresses in the program will need to be relocated by the loader because some memory will already be in use.