A term used to describe the phenomenon of creating, curating and exploiting extremely large data sets to reveal patterns, trends and associations
Big data
Many of these big data sets are so large that they cannot easily be held physically in one location and so they are stored in the Cloud or in multiple locations on large servers (and research is currently underway to see if these can be stored in the oceans)
Big Data can be used to analyse the behaviour of groups and individuals(!) or to spot trends or associations that could be useful to society (such as health-related issues)
Sources of big data
Mobile phones
Internet interactions, including search engines (Google)
Social media (FB, Instagram)
GPS devices (Google maps, Tomtom)
Smart speakers (Alexa, Google spot)
Buying behaviours (Amazon)
There are some practical problems with big data: storage, processing capacity, timeliness of analysis
Characteristics of big data
Velocity: speed of data generation and processing
Volume: amount generated / processed
Variety: structured vs. unstructured (free text) data
Value: monetisation and other uses
Veracity: trustworthiness (e.g. biases, abnormalities, and decision-making)
Variability: changes in data
Visualisation: summarising
Data analytics
Allows us to analyse raw data and extract useful trends / insights from the data which helps individuals and firms to make decisions
Data analytics example
An analyst analyses the data from the stock prices from UBS, RBS, Deutsche Bank, JP Morgan and Morgan Staley. Reflecting on his analysis, he can specify which stock has been the most volatile in the last quarter
The analyst's task is to comment on the volatility of stocks, not to make an investment decision
Types of data analytics
Descriptive analytics: describes something that has already happened
Diagnostic analytics: describes the reason for the historical results
Predictive analytics: determines what will happen by analysing historical data and trends
Prescriptive analytics: uses the information from descriptive, diagnostic, and predictive analytics to suggest specific decisions or courses of action
Predictive data analytics
The art of building and using models to make predictions based on patterns extracted from historical data
Applications of predictive data analytics
Price prediction
Diagnosis and dosage prediction
Risk assessment
Propensity modelling
Document classification
Machine learning (ML)
The automated process for extracting patterns from data
Supervised Machine Learning (SML)
Used to build models used in predictive data analytics applications. SML techniques automatically learn a model of the relationship between a set of descriptive and target feature, based on historical examples
The more historical examples, the better the learning in SML
Steps in SML
1. Learning
2. Predicting
Other types of MLs include unsupervised learning, semi-supervised learning and reinforced learning
How ML works
ML algorithms search through possible prediction models and searches for the model that explains the relationship between the descriptive and a target feature in a dataset. The criteria for search is to look for consistency
Unsupervised ML (UML)
We use UML when we do not have a target feature. Therefore, we model the underlying structure within the descriptive features in a dataset. We can look at it as a way of feature generation
Reinforced ML (RML)
RML is used to control the behaviours of autonomous systems. As the name suggests, we are 'reinforcing' the learning