Natural Language Processing is a computer's ability to understand human language
syntax is structure, semantics is meaning
Formal Grammar:
Set of rules on sentences within a language
One type of Formal Grammar is Context-Free Grammar
n-gram:
a contiguous sequence of n items from a text sample
tokenisation:
n-grams can recognise commonly paired words (and thus interpret possible meaning)
example:
common bi-gram tokens: 'of the', 'is the', 'it was', 'it is', 'he did', 'and I', 'and the', etc.
Markov Chain:
can be used to represent the prediction of new words, based on common words found in n-gram tokens
Bag-of-Words Model:
model representing text as an unsorted collection of words
Naive Bayes Rule:
probability of b given a, equals probability of a given b times probability of b, divided by probability of a
assumes all words in texts a are independent (naivety)
Additive Smoothing:
within a Naive Bayes model, adding a value α to all values in the distribution, to smooth data - preventing any data being left as '0' (and thus nulling the end value, regardless of other contributing values)
one form is laplace smoothing, wherein α = 1
Word Representation:
representing words numerically
one type is one-hat representation, wherein words are represented as vectors containing a single 1 and the remaining spaces 0s
One-hat word representation is inconvenient for large sets of words and does not consider similarity
Distributed Representation does, by assigning spread out floating values - with similar-meaning words having similar values
Computers can estimate a word's meaning by analysing the words commonlysurrounding it
Word2Vec:
model that generates word vectors (as used in distributed representation)
moves word vectors such that similar-meaning words are closer together in value
A typical ANN must have a fixed size output
To get around this (and thus allow for generative text), a Recurrent Neural Network can be used - which passes the hidden state back into the same neural network, repeatedly
A Recurrent Neural Network passes the hidden state back into the network as encoded input then decodes for output
Attention:
analysing the importance of each word input, based on the next word to generate
Transformers:
transformer architecture is a type of ANN that allows for parallelization
Transformer ANN consider an input word's position and self-attention (meaning)
Input words output as encoded representation, which is passed through a second time as 'attention'