Week 6 Language

Cards (18)

  • Natural Language Processing is a computer's ability to understand human language
  • syntax is structure, semantics is meaning
  • Formal Grammar:
    Set of rules on sentences within a language
  • One type of Formal Grammar is Context-Free Grammar
  • n-gram:
    a contiguous sequence of n items from a text sample
    tokenisation:
    n-grams can recognise commonly paired words (and thus interpret possible meaning)
    example:
    common bi-gram tokens: 'of the', 'is the', 'it was', 'it is', 'he did', 'and I', 'and the', etc.
  • Markov Chain:
    can be used to represent the prediction of new words, based on common words found in n-gram tokens
  • Bag-of-Words Model:
    model representing text as an unsorted collection of words
  • Naive Bayes Rule:
    probability of b given a, equals probability of a given b times probability of b, divided by probability of a
    assumes all words in texts a are independent (naivety)
  • Additive Smoothing:
    within a Naive Bayes model, adding a value α to all values in the distribution, to smooth data - preventing any data being left as '0' (and thus nulling the end value, regardless of other contributing values)
    one form is laplace smoothing, wherein α = 1
  • Word Representation:
    representing words numerically
    one type is one-hat representation, wherein words are represented as vectors containing a single 1 and the remaining spaces 0s
  • One-hat word representation is inconvenient for large sets of words and does not consider similarity
    Distributed Representation does, by assigning spread out floating values - with similar-meaning words having similar values
  • Computers can estimate a word's meaning by analysing the words commonly surrounding it
  • Word2Vec:
    model that generates word vectors (as used in distributed representation)
    moves word vectors such that similar-meaning words are closer together in value
  • A typical ANN must have a fixed size output
    To get around this (and thus allow for generative text), a Recurrent Neural Network can be used - which passes the hidden state back into the same neural network, repeatedly
  • A Recurrent Neural Network passes the hidden state back into the network as encoded input then decodes for output
  • Attention:
    analysing the importance of each word input, based on the next word to generate
  • Transformers:
    transformer architecture is a type of ANN that allows for parallelization
  • Transformer ANN consider an input word's position and self-attention (meaning)
    Input words output as encoded representation, which is passed through a second time as 'attention'