Syntax

Cards (51)

  • Syntax
    The building blocks of programming syntax
  • Context-free grammar
    • Described in Backus-Naur Form (BNF)
    • Representation of syntactic structure
    • Issues of BNFs in programming language
  • In the early days of programming languages, both the syntax and semantics of a language were described by lengthy English explanations
  • Noam Chomsky developed the idea of context-free grammars, and John Backus, with contributions by Peter Naur, developed a notational system for describing these grammars that was used for the first time to describe the syntax of Algol60

    1950s
  • Lexical structure of a programming language

    The structure of its tokens, or words
  • Typical token categories
    • Reserved words/keywords
    • Literals/constants
    • Special symbols
    • Identifiers
  • Reserved words

    An identifier cannot have the same character string as a reserved word
  • Predefined identifiers
    Identifiers that have been given an initial meaning for all programs in the language but that are capable of redefinition
  • Free-format language
    Format has no effect on the program structure
  • Fixed format language
    All tokens must occur in pre-specified locations on the page
  • Regular expressions
    Describe tokens formally
  • Context-free grammar
    Consists of a series of grammar rules, with a left-hand side that is a single phrase structure name, followed by the metasymbol "→", followed by a right-hand side consisting of a sequence of items that can be symbols or other phrase structure names
  • Nonterminals
    Names for phrase structures, broken down into further phrase structures
  • Terminals
    Words or token symbols
  • Grammar rules

    Also called productions, "produce" the strings of the language using derivations
  • Backus-Naur Form (BNF)
    Productions are in this form if they are as given using only the metasymbols "" and "|"
  • Context-free grammar
    Defines a language, called the language of the grammar
  • Grammar rule examples
    • <digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
    • <nat> → <digit>
    • <nat> → <digit> <nat>
  • Parse tree
    Graphical depiction of the replacement process in a derivation, labeled by nonterminals at interior nodes and terminals at leaves
  • Abstract syntax tree
    Abstracts the essential structure of the parse tree, may do away with redundant terminals
  • Ambiguity
    Two different derivations can lead to the same parse tree or syntax tree
  • Left-recursive rule
    Causes left-association
  • Right-recursive rule

    Causes right-association
  • Lexical conventions
    Specify details of formatting, such as white-space conventions, separate from the grammar
  • Reserved words
    Fixed strings of characters that are tokens themselves and cannot be used as identifiers
  • Predefined identifiers
    Fixed strings that are given a predefined meaning in a language, but this meaning can be changed by redefining it within a program
  • Syntax and semantics can become interdependent in languages when semantic information must be used to distinguish ambiguous parsing situations
  • During its scanning phase, a translator collects sequences of character from the input program and forms them into tokens.
  • During its parsing phase, the translator then processes these tokens, thereby determining the program's syntactic structure
  • Reserved words

    sometimes called keywords (if and while)
  • Literals or constants
    such as 42 (numeric literal) or "hello" ( a string litral)
  • Special symbols

    ; , <=, +
  • Identifiers
    such as x24, monthly_balance, or putchar
  • double if;

    This variable declaration is illegal in C because 'if' is a reserved word.
  • Examples of predefined identifiers are all standard data types in Ada, such as integer and boolean, and in Python, such as round and print.
  • The format of a program can affect the way tokens are recognized
  • The principle of longest substring requires that certain tokens can be separated by "token delimiters" or white space
  • Most modern languages are free format
  • Tokens in a programming language are often described in English, but they can also be described formally by regular expressions.
  • Three basic operations of Regular Expressions
    1. Concatenation
    2. Repetition
    3. Choice or Selection