4.4.5 Data volume versus data quality

Cards (9)

  • Data volume
    The amount of data being processed
  • Data quality
    The accuracy and trustworthiness of the data
  • The results from any data analysis are only as good as the original data that is being processed
  • Sampling
    • Improves the quality of the data at the expense of the volume
    • Researchers often select the best quality data from reliable sources
    • Researchers often exclude extreme values as they are unlikely to be representative
  • The sheer volume of data involved in big data processing means there is high probability that at least some of the data is of poor quality
  • Veracity
    Ensuring the correctness and trustworthiness of the data
  • Other 'Vs' describing big data
    • Value
    • Variability
    • Visualisation
  • The 'toos'
    • Too much data for traditional databases
    • Too complex for conventional categorisation
    • Too many updates to the data
  • Use of big data in 2016 US presidential campaign
    • Large volumes of data on voters
    • Constant updates on voter opinions
    • Variety of data sources including social media and credit card purchases