Data analysis

Cards (34)

  • Statistics as a science deals with the collection, organization, presentation, analysis, and interpretation of data
  • Statistic as a number is a collection of facts and figures, processed data, and information
  • Categories/Phases of Statistics:
    • Descriptive Statistics: Concerned with organization, presentation, and summarization of a set of data without inferring beyond the data at hand
  • Basic Concepts:
    • Universe: Set of all entities under study
    • Variables: Attributes measurable from each individual in the universe
    • Population: Set of all observed values of the variable
    • Sample: A part/subset of the population or universe taken to represent the whole
  • Levels of Measurement:
    • Nominal: Data collected are labels or categories without an explicit ordering
    • Ordinal: Data collected are labels with an implied ordering, where ranking can be done but distance between labels cannot be quantified
    • Interval: Data collected can be ordered and added or subtracted, but not divided or multiplied
    • Ratio: Data collected has all properties of the interval scale and can be multiplied and divided
  • Types of Data:
    • Primary Data: Acquired directly from the source
    • Secondary Data: Second-hand data
  • Methods of Data Presentation:
    • 1.Textual Method: Presenting data using paragraphs
    • 2.Tabular Method: Organizing raw data in table form
    • 2.Tabular Method: Organizing raw data in table form3.Graphical Method: Visual representation aiding in understanding data at a glance understanding data at a glance
  • Numerical Descriptive Measures:
    • Measures of Location: Values within the data set describing their location or position relative to the entire data set
    • Measure of Central Tendency:
    • Mean: Sum of all values divided by the number of values
    • Median: Value dividing the data set into two equal parts
    • Mode: Most frequently occurring value
  • Dividing data into quartiles:
    • Q1 = 8, Q2 = 12, Q3 = 16
  • Range (R) is the difference between the greatest and least data value
  • Variance (ơ2) takes into account the spread of all data points in a data set
  • Coefficient of Variation (CV) is a percentile expression of the mean
  • Measures of Skewness:
    • SK>0 → Positively Skewed
    • SK<0 → Negatively Skewed
    • SK=0 → Normal/Bell-shaped
  • Measure of Kurtosis:
    • K>0 → Leptokurtic
    • K<0 → Platykurtic
    • K=0 → Normal/Mesokurtic
  • Inferential Statistics: Concerned with making objective generalizations about a larger data set based on information from a part of it
  • Standard Deviation (ơ) (s) or √𝒔 𝟐 - it is the set of numerical data that uses of amount which is the individual data value deviates from the mean. Formula: S=√𝒔 𝟐
  • Coefficient of Variation (CV) - Percentile expression of the mean -the higher the CV, the higher the variation (directly proportional) - the CV of highly precise analyzers can be lower than 1%
  • PERCENTILES – Divide the array of data into 100 equal parts.
  • DECILES – divide the array into 10 equal parts.
  • QUARTILES – Divide the array of data into 4 equal parts
  • PROPERTIES OF THE MEAN - Unique - Amenable to algebraic manipulations - Not defined for qualitative data - Easily affected by extreme values
  • PROPERTIES OF THE MEDIAN - Unique - Not amenable to algebraic manipulations - Applicable only to variables for which ordering of values is possible - Not affected by extreme values
  • PROPERTIES OF THE MODE - Not affected by extreme values - Defined for both quantitative and qualitative data - Not unique
  • NOMINAL (descriptive) EXAMPLE:
    • Gender, Personal Preference, Nationality, Personality Type, Continent, Hair Color
  • ORDINAL (descriptive)EXAMPLE:
    • Educational Level, Socio Economic Status, Income Level
  • INTERVAL (inferential)EXAMPLE:
    • temperature in Celsius,
  • RATIO (inferential)EXAMPLE:
    • Weight, height, Temperature in kelvin, length of time
  • LEVELS OF MEASUREMENT - Characteristic of the data collected as a consequence of the operational definition used in the data collection. - Dictate the type of mathematical operations which can be performed. - Determine the appropriate statistical methods to be done.
  • NOMINAL (descriptive) - Data collected are simply labels or names or categories w/o any explicit ordering of labels. - Observation with the same label belong to the same category. - Frequencies or counts of observations belonging to the same category can be obtained.
  • ORDINAL (descriptive)
    Data collected are labels or classes with an implied ordering of these labels.
    Ranking can be done on the data.
    • Distance between 2 labels cannot be quantified.
  • INTERVAL (inferential) - Data collected can be ordered, and in addition, may be added or subtracted, but not divided nor multiplied. - Zero point is arbitrary. - Indicates an actual amount (numerical). The order and the difference between the variables can be known. Its limitation is it has no “True Zero”
  • RATIO (inferential) - Data collected has all the properties of the interval scale, and in addition, can be multiplied and divided. - Zero point is absolute. - It has the same properties as the interval level. The order and difference can be described. Additionally, it has a true zero and the ratio between two points has a meaning.
  • Statistics as a science -deals with the collection, organization, presentation, analysis, and interpretation of data. -is a study of variation. -is a study of making objective or fair conclusions/generalizations about a larger set of data based on information from a part.
  • Statistic as a number -is a collection of facts and figures -is processed data -is information