a programs that converts high-level source code into low-level object code
it's then ready to be executed by computer
High-Level Code
written and understood by programmer but not the computer
Low-Level Code
can be executed by computer but not directly understood
Compiler
translates code all at once after carrying out checks and reporting back errors
initial process is longer than using an interpreter or assembler
if changes need to be made, whole program must be recompiled
once compiled, code can only be executed on certain devices as compiling code is specific to a certain processor type and OS
can be run without a translator present
Interpreter
translates and executes code line-by-line
stops and produces error if line contains an error
initially appears faster than compilers as code is instantly executed but is slower as code must be translated each times it's executed with an interpreter
makes interpreters useful for testing code and pinpointing errors
requires interpreter in order to run on different devices
code can be executed on a range of platforms with right interpreter
makes code more portable
Assembly Code
low-level language
the next step up from machine code
platform specific
instructions used are dependent on the instruction set of the processor
Assembler
translate assembly code into machine code
each line of assembly is equal to almost one line of machine
translated on an almost 1 to 1 basis
Stages of Compilation
lexical analysis
syntax analysis
code generation
optimisation
Lexical Analysis
first stage of compilation
whitespace and comments are removed from code
remaining code is checked for keywords/names of variables and constants
these are replaced with tokens and info about each is stored in a symbol table
Syntax Analysis
tokens are checked against grammar and rules of the programming language
tokens that break the rules are flagged as syntax errors and added to a list of errors
abstract syntax tree is produced
more detail about identifiers is added to symbol table
semantic analysis is carried out
Syntax Errors - Examples
undeclared variable type
incomplete set of brackets
Semantic Errors - Examples
multiple declaration
undeclared identifiers
Code Generation
produced in syntax analysis stage
used to produce machine code
Optimisation
searches through code for area that could be more efficient
aims to make code faster to execute but can add to overall time taken for compilation
redundant parts of code are detected and removed
repeated sections may be grouped and replaced with more efficient versions
excessive optimisation risks altering the way the program behaves
Linkers
software responsible for linking external modules and libraries in the code
Linkers - Types
static
dynamic
Static
modules/libraries are added directly to main file
this increases file size
updates externally will not affect program
because of this, a specific version of the library can be used
Dynamic
addresses of modules/libraries are included in file they're mentioned in
when program is run, loader retrieves program at the address so it can be executed
files remain small and they are affected by external updates
code does not need to be rewritten
Loaders
programs provided by OS
when file is executed, loader retrieves the library/subroutine from given memory location
Libraries
precompiled programs that can be added into other programs using static or dynamic linking
Libraries - Advantages
ready-to-use
error free
save time developing and testing modules
can be reused in multiple programs
use already developed functions to save programmers having to rewrite them