Week 3 good

Cards (19)

  • Big data
    A term used to describe the phenomenon of creating, curating and exploiting extremely large data sets to reveal patterns, trends and associations
  • Big data
    • Many of these big data sets are so large that they cannot easily be held physically in one location and so they are stored in the Cloud or in multiple locations on large servers (and research is currently underway to see if these can be stored in the oceans)
    • Big Data can be used to analyse the behaviour of groups and individuals(!) or to spot trends or associations that could be useful to society (such as health-related issues)
  • Sources of big data
    • Mobile phones
    • Internet interactions, including search engines (Google)
    • Social media (FB, Instagram)
    • GPS devices (Google maps, Tomtom)
    • Smart speakers (Alexa, Google spot)
    • Buying behaviours (Amazon)
  • There are some practical problems with big data: storage, processing capacity, timeliness of analysis
  • Characteristics of big data
    • Velocity: speed of data generation and processing
    • Volume: amount generated / processed
    • Variety: structured vs. unstructured (free text) data
    • Value: monetisation and other uses
    • Veracity: trustworthiness (e.g. biases, abnormalities, and decision-making)
    • Variability: changes in data
    • Visualisation: summarising
  • Data analytics
    Allows us to analyse raw data and extract useful trends / insights from the data which helps individuals and firms to make decisions
  • Data analytics example
    • An analyst analyses the data from the stock prices from UBS, RBS, Deutsche Bank, JP Morgan and Morgan Staley. Reflecting on his analysis, he can specify which stock has been the most volatile in the last quarter
  • The analyst's task is to comment on the volatility of stocks, not to make an investment decision
  • Types of data analytics
    • Descriptive analytics: describes something that has already happened
    • Diagnostic analytics: describes the reason for the historical results
    • Predictive analytics: determines what will happen by analysing historical data and trends
    • Prescriptive analytics: uses the information from descriptive, diagnostic, and predictive analytics to suggest specific decisions or courses of action
  • Predictive data analytics
    The art of building and using models to make predictions based on patterns extracted from historical data
  • Applications of predictive data analytics
    • Price prediction
    • Diagnosis and dosage prediction
    • Risk assessment
    • Propensity modelling
    • Document classification
  • Machine learning (ML)
    The automated process for extracting patterns from data
  • Supervised Machine Learning (SML)
    Used to build models used in predictive data analytics applications. SML techniques automatically learn a model of the relationship between a set of descriptive and target feature, based on historical examples
  • The more historical examples, the better the learning in SML
  • Steps in SML
    1. Learning
    2. Predicting
  • Other types of MLs include unsupervised learning, semi-supervised learning and reinforced learning
  • How ML works
    ML algorithms search through possible prediction models and searches for the model that explains the relationship between the descriptive and a target feature in a dataset. The criteria for search is to look for consistency
  • Unsupervised ML (UML)

    We use UML when we do not have a target feature. Therefore, we model the underlying structure within the descriptive features in a dataset. We can look at it as a way of feature generation
  • Reinforced ML (RML)

    RML is used to control the behaviours of autonomous systems. As the name suggests, we are 'reinforcing' the learning