Programming language translators

Created by

mexa

Cards (17)

Assembly Code
Computers execute machine code. It is difficult to read, write and debug machine code. Assembly code instructions are equivalent to machine code but easier for humans to work with.
Assembler
Assembly code is a low level language. An assembler translates assembly code instructions into machine code. Each processer has its own instruction set and so the object code produced will be hardware specific.
Compiler
Translates a whole program written in a high level language into executable machine code. The resulting machine code is called object code. The object code produced is hardware specific.
Interpreter 
Translates code written in a high level language into machine code. It does this line by line rather than translating the whole program before any of it can be executed.
Advantages of a compiler vs Interpreter:
Program can be run many times without the need to recompile
Faster to execute
Executable code does not require the interpreter to run
Compiled code cannot be easily read and copied by others
Interpreter:
Source code can be run on any machine with the interpreter
If a small error is found, no need to recompile the entire program
Bytecode
Java is compiled into bytecode which is an intermediate step between source code and machine code. Bytecode is interpreted by a bytecode interpreter e.g. Java virtual machine
Stages of Compilation
A compiler goes through several stages to convert source code to object code:
Lexical analysis
Symbol table
Syntax analysis
Semantic analysis
Code generation
Lexical Analysis 
All unnecessary spaces and all comments are removed. Keywords, constants and identifiers are replaced with tokens representing their function in the program.
Lexical Analysis Example 
age = 17
print(age)
May produce the tokens:
<identifier><operator><number><keyword><open_bracket><identifier><close_bracket>
Symbol table 
The lexer will build up a symbol table for every keyword and identifier in the program. The symbol table helps to keep track of the run-time memory address for each identifier.
Syntax Analysis 
The stream of tokens from the lexing stage is split up into phrases. Each phrase is parsed which means it is checked against the rules of the language. If the phrase is not valid, an error will be recorded. E.g. <number><operator><identifier> may not be valid and would be picked up by syntax analysis
Semantic Analysis 
It is possible to create a sequence of token which is valid syntax but is not a valid program. This is what semantic analysis checks for. E.g. <if><identifier><operator><number> is valid syntax however, the identifier has not previously been declared so it is semantically not a valid program
Code Generation 
Once the program has been checked, the compiler generates the machine code. It may do this in several "passes" over the code because code optimisation will also take place.
Code Optimisation
Aims to:
reduce redundant instructions
replace inefficient code with code that achieves the same result but in a more efficient way
Libraries 
Sets of pre-written functions. A programmer can write their own libraries. A library function can be called within a program.
Linker 
The linker needs to put the appropriate memory addresses in place so that the program can call and return from a library function.
Loader
The job of the loader is to copy the program and any linker subroutines into main memory to run. When the executable code was created it may assume the program will load in memory address 0. However, memory addresses in the program will need to be relocated by the loader because some memory will already be in use.