Syntax

Created by

Haa Ruu

Cards (51)

Syntax 
The building blocks of programming syntax
Context-free grammar 
Described in Backus-Naur Form (BNF)
Representation of syntactic structure
Issues of BNFs in programming language
In the early days of programming languages, both the syntax and semantics of a language were described by lengthy English explanations
Noam Chomsky developed the idea of context-free grammars, and John Backus, with contributions by Peter Naur, developed a notational system for describing these grammars that was used for the first time to describe the syntax of Algol60 
1950s
Lexical structure of a programming language 
The structure of its tokens, or words
Typical token categories 
Reserved words/keywords
Literals/constants
Special symbols
Identifiers
Reserved words 
An identifier cannot have the same character string as a reserved word
Predefined identifiers 
Identifiers that have been given an initial meaning for all programs in the language but that are capable of redefinition
Free-format language 
Format has no effect on the program structure
Fixed format language 
All tokens must occur in pre-specified locations on the page
Regular expressions 
Describe tokens formally
Context-free grammar 
Consists of a series of grammar rules, with a left-hand side that is a single phrase structure name, followed by the metasymbol "→", followed by a right-hand side consisting of a sequence of items that can be symbols or other phrase structure names
Nonterminals 
Names for phrase structures, broken down into further phrase structures
Terminals 
Words or token symbols
Grammar rules 
Also called productions, "produce" the strings of the language using derivations
Backus-Naur Form (BNF) 
Productions are in this form if they are as given using only the metasymbols "→" and "|"
Context-free grammar 
Defines a language, called the language of the grammar
Grammar rule examples 
<digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<nat> → <digit>
<nat> → <digit> <nat>
Parse tree
Graphical depiction of the replacement process in a derivation, labeled by nonterminals at interior nodes and terminals at leaves
Abstract syntax tree
Abstracts the essential structure of the parse tree, may do away with redundant terminals
Ambiguity
Two different derivations can lead to the same parse tree or syntax tree
Left-recursive rule 
Causes left-association
Right-recursive rule 
Causes right-association
Lexical conventions
Specify details of formatting, such as white-space conventions, separate from the grammar
Reserved words
Fixed strings of characters that are tokens themselves and cannot be used as identifiers
Predefined identifiers
Fixed strings that are given a predefined meaning in a language, but this meaning can be changed by redefining it within a program
Syntax and semantics can become interdependent in languages when semantic information must be used to distinguish ambiguous parsing situations
During its scanning phase, a translator collects sequences of character from the input program and forms them into tokens.
During its parsing phase, the translator then processes these tokens, thereby determining the program's syntactic structure
Reserved words 
sometimes called keywords (if and while)
Literals or constants 
such as 42 (numeric literal) or "hello" ( a string litral)
Special symbols 
; , <=, +
Identifiers 
such as x24, monthly_balance, or putchar
double if; 
This variable declaration is illegal in C because 'if' is a reserved word.
Examples of predefined identifiers are all standard data types in Ada, such as integer and boolean, and in Python, such as round and print.
The format of a program can affect the way tokens are recognized
The principle of longest substring requires that certain tokens can be separated by "token delimiters" or white space
Most modern languages are free format
Tokens in a programming language are often described in English, but they can also be described formally by regular expressions.
Three basic operations of Regular Expressions
Concatenation
Repetition
Choice or Selection