A field of Artificial Intelligence that gives machines the ability to read, understand and derive meaning from human languages
Natural Language Processing (NLP)
The automatic manipulation of natural language, like speech and text, by software
NLP Applications
Language Translator
Social Media Monitoring
Chatbots
Survey Analysis
Targeted Advertising
Hiring and Recruitment
Voice Assistants
Grammar Checkers
Email Filtering
Natural Language
It is primarily hard because it is messy, with few rules, yet humans can easily understand each other most of the time
It is highly ambiguous and ever changing/evolving
Linguistics
The scientific study of language, including grammar, semantics, and phonetics
Computational Linguistics
The modern study of linguistics using computer science tools
Statistical Natural Language Processing
The more engineer-based or empirical statistical methods approach to NLP, often using machine learning and statistical techniques
Linguistics is a large topic of study, and although the statistical approach to NLP has shown great success in some areas, there is still room and great benefit from the classical top-down methods
Natural Language Processing (NLP)
Automatic computational processing of human languages, including algorithms that take human-produced text as input, and algorithms that produce natural looking text as outputs
Tokenizing Text Data
Using the Natural Language Toolkit (NLTK) python package to break down text into smaller pieces for analysis
Natural Language Processing (NLP)
A field of Artificial Intelligence that gives the machines the ability to read, understand and derive meaning from human languages
Diego Lopez Yse: 'Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software.'
Jason Browniee: 'Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software.'
NLP Applications
Language Translator
Social Media Monitoring
Chatbots
Survey Analysis
Targeted Advertising
Hiring and Recruitment
Voice Assistants
Grammar Checkers
Email Filtering
Tokenization
The process of dividing text into a set of pieces, such as words or sentences. These pieces are called tokens.
Tokenization
Splitting a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or terms
The tokens could be words, numbers or punctuation marks
Tokenization is the most basic step to proceed with NLP (text data)
Stemming
A way of producing morphological variants of a root/ base word
Lemmatization
The process of grouping together the different inflected forms of a word so they can be analyzed as a single item
Lemmatization brings more context to the words compared to stemming
Part of Speech (POS)
Explains how a word is used in a sentence
Main Parts of Speech
Nouns
Pronouns
Adjectives
Verbs
Adverbs
Prepositions
Conjunctions
Interjections
POS Tagging
Labelling words with their appropriate Part-Of-Speech
Chunking
The process of dividing text data into pieces for further analysis
Chunking is different from tokenization as it aims to extract meaningful pieces of text
Sentiment Analysis
The process of determining the sentiment of a piece of text
Sentiment analysis is one of the most popular applications of natural language processing
Naive Bayes classifier can be used to build a sentiment analyzer