Collection, Classification and Presenting Data

Cards (39)

  • Data collection is concerned with the accurate acquisition of data; although methods may differ depending on the field, the emphasis on ensuring accuracy remains the same
  • Accurate data collection is essential to ensure the integrity of the research, regardless of the field of study or data presence (quantitative or qualitative)
  • Data presentation
    Method by which people summarize, organize and communicate information using a variety of tools, such as diagrams, distribution charts, histograms and graphs
  • Coding data

    Typically used to give numerical numbers to measurements that do not automatically have numbers assigned to them
  • Coding data

    Typically used to present non-mathematical values as numbers through electronic means
  • Stem-and-leaf diagrams

    Popular types of data representation used to create a visual image of the mathematical information that mathematicians wish to convey, e.g. human age or date of birth
  • Data coding
    The process of converting data into a form that can be analyzed, involving assigning numerical or categorical codes to data items
  • Types of data coding
    • Nominal coding
    • Ordinal coding
    • Dichotomous coding
    • Numeric coding
  • Nominal coding
    Assigning labels or categories to data items
  • Ordinal coding
    Assigning categories to data items in a specific order
  • Dichotomous coding

    Assigning a binary code (e.g., 0 or 1) to data items
  • Numeric coding
    Assigning numerical values to data items
  • Data classification
    A diverse process that involves various methods and criteria for sorting data within a database or repository
  • Examples and application of data classification
    • Separating customer data based on gender
    • Identifying and keeping frequently used data in disk/memory cache
    • Data sorting based on content/file type, size and time on data
    • Sorting for security reasons by classifying data into restricted, public or private data types
  • Data collection
    The gathering of a set of observations about variables and it is the starting point of research methods
  • Types of data
    • Primary data
    • Secondary data
  • Primary data
    Data collected for the first time and in crude form, always collected from the source
  • Methods of collecting primary data
    • Direct personal observation
    • Indirect oral interviews
    • Mailed questionnaire method
    • Schedule method
    • From local agents
  • Secondary data
    Second-hand information that has already been collected, generally used when the time of enquiry is short and the accuracy of the enquiry can be compromised to some extent
  • Categories of collecting secondary data

    • Published sources
    • Unpublished sources
  • Population
    The entire group that you want to draw conclusions about
  • Sample
    The specific group that you will collect data from, with a size less than the total size of the population
  • Unbiased
    When the average of a large set of unbiased measurements will be close to the true value
  • Precise
    When repeated measurements will be close to one another, but not necessarily close to the true value
  • An estimate of a parameter taken from a random sample is known to be unbiased, and as the sample size increases, it gets more precise
  • Data presentation and display
    Involves more than just drawing graphs, and includes understanding the type of data, the intended audience, and how the information will be used
  • Decisions should not be made based on graphs alone, as no graph can tell you everything you need to know
  • The purpose of presenting data graphically
    To provide information to assist in decision making and to monitor activities in progress
  • Ways of displaying or presenting data
    • Stem and leaf
    • Time sequence plot
    • Control chart
    • Lag plot
    • Scatter plot
    • Digidot plot
    • Dot plot
    • Histogram
    • Boxplot
  • Outlier
    An extremely high or extremely low data point relative to the nearest data point and the rest of the neighboring co-existing values in a data graph or dataset
  • Outlier-related concepts
    • Interquartile range
    • Determining outliers
    • Strong outliers
    • Weak outliers
  • Descriptive statistics are sensitive to outliers, which is why it is important to check for them
  • Descriptive statistics
    Summary statistics that quantitatively describe or summarize features of a collection of information
  • Classification
    The process of arranging things in groups or classes according to their resemblances and affinities, and gives expression to the unity of attributes that may subsist amongst a diversity of individuals
  • Characteristics of classification
    • Equal interval
    • Quantile
  • Equal interval classification
    The classification scheme divides the range of attribute values into equal-sized sub-ranges, allowing you to specify the number of intervals while it determines where the breaks should be
  • Quantile classification
    Each class contains an equal number of features, and is well-suited to linearly distributed data
  • Finding the number in a data set where 20% of values fall below it and 80% fall above

    1. Order the data from smallest to largest
    2. Count the number of observations
    3. Convert the percentage to a decimal
    4. Insert the values into the formula: ITH OBSERVATION = Q (N + 1)
  • Natural breaks (Jenks) classification
    Classes are based on natural groupings inherent in data, identifying break points by picking the class breaks that best group similar values and maximize the differences between classes