6.3.1 Data systems

Cards (45)

  • Data must first be transformed (wrangled) , before a company can use it properly, this usually involves changing raw data into a standardized format version of the data
  • The first step of data wrangling is to understand the data and fully understand what the data is about - this is called discovery
  • Data wrangling can reduce algorithmic bias as it makes a dataset more accurate for its purposes
  • The third step of data wrangling is to remove biased data, or inaccurate data - this is called cleansing
  • The second step of data wrangling is to structure the data, this makes the data easier to access -this is the structure part
  • The fourth step of data wrangling is to enrich it with anything that will help it meet specified needs. This is because external data rarely has all the required parts readily available. This is enrichment
  • The fifth step of data wrangling is to validate the dataset, this is where the data is checked for it's reliability, quality, and safety. This often involves ensuring the data is complete and meets given field - this is called validation
  • The sixth step of data wrangling is to use the data, and is for when the data is full and complete, which are both found when data is assured - this is known as publishing
  • All data systems will have the same core functions; different organization use these functions in different ways depending on how the data shall be processed and analysed, as well as what data will be used
  • The core functions of all data systems are:
    Input
    Search
    Save
    Integrate
    Organise
    Output
    Feedback loop
  • For data systems what is input?
    collecting raw data
  • For data systems what is search?
    Searches ensure data meets the needs of an organisation
  • For data systems what is save?
    Storing data in a system to be used again
  • For data systems what is integrate?
    Integrating different forms of data into a single location, allowing for a complete output
  • Database normalization is the process of organizing data so that it can be easily accessed and updated while minimizing redundancy.
  • Data warehousing is an approach to storing large amounts of structured data from multiple sources into one central repository
  • For data systems organise is?
    organising and indexing saved data to ensure it meets the end users requirements
  • For data systems what is output?
    The processed and analysed data, is sent to relevant people
  • For data systems what is the feedback loop?
    Measuring the outputs to evaluate the process effectiveness
  • Data has to be inputted into a digital system. This is often done by combining data stores. However, the origin of most digital data is that a human manually inputted each value.
  • There are two main error types that may occur during data entry, when inputting it manually. These are primarily:
    Transcription errors - when data is inputted with an incorrect character such as a hitting two keys at once. Such as Stuart being typed as Styart
    Transposition errors - these occur when data is inputted in reverse such as Stuart being typed as Staurt
  • How are data entry errors reduced?
    By validating and verifying inputted data
  • What is validation?
    Checking data is suitable and meets pre-set rules
  • What is verification?
    Seeing if data being entered into the digital system is identical to the source
  • Validation techniques can be used on any data entry to reduce the risk of errors such as on an online form
  • It is good to match validation with a error message if data inputted is invalid
  • If data entered is incorrect this can lead to GiGo, which stands for garbage in garbage out. This means that when data is processed it will be incorrect as the data inputted was incorrect
  • When data needs to be entered for a large industry, data-entry screens are developed
  • data-entry screens must be made suitable for the industry by the developer understanding the needs of an industry.
  • For example for data-entry screens data often needs to be formatted correctly and therefore rules must be produced to avoid erroneous data entry
  • When data has been entered into digital systems it must be maintained, and this can be done in various ways:
    Carrying out regular scheduled searches to remove redundant or expired data
    Regularly updating data when it may change over time such as a mail list
  • A company, legally, must be able to maintain data. As whenever A user requests to remove data, the company must oblige (right to be forgotten). Therefore they must be able to find the users data and remove it, which is a form of maintenance.

    Additionally other data subject needs such as the right to rectify data, for example if info of them changes
  • Once data has been inputted, processed and analysed it has to be output (presented), in a format that makes the data helpful to the end users. The main 4 ways information is presented:
    • Graphs/charts
    • Data tables
    • Reports
    • Infographics
  • Graphs and charts are best used to present numerical data, it is possible for the end user to misinterpret them and therefore it's design must be considered
  • Data tables are useful when data is related, such as the percentage increase in train users across different regions of England
  • Data tables thrive with small amounts of data, as this makes it easy to interpret. They must be properly labelled and often coloured to present a message.
  • Reports are generally written info regarding data patterns and trends, it is often used when the end user is presenting info to those who may not have context and therefore it allows the user to curate what they wish to present
  • Infographics are a great way of making data more memorable and more comparable to the real world. They are often cover an entire topic giving minimal detail but instead a general idea
  • All data that is gthered, processed, and analysed must have a high level of reliability and quality
  • Data assurance checks that data isn't unreliable or of bad quality