Data analysis

    Cards (34)

    • Statistics as a science deals with the collection, organization, presentation, analysis, and interpretation of data
    • Statistic as a number is a collection of facts and figures, processed data, and information
    • Categories/Phases of Statistics:
      • Descriptive Statistics: Concerned with organization, presentation, and summarization of a set of data without inferring beyond the data at hand
    • Basic Concepts:
      • Universe: Set of all entities under study
      • Variables: Attributes measurable from each individual in the universe
      • Population: Set of all observed values of the variable
      • Sample: A part/subset of the population or universe taken to represent the whole
    • Levels of Measurement:
      • Nominal: Data collected are labels or categories without an explicit ordering
      • Ordinal: Data collected are labels with an implied ordering, where ranking can be done but distance between labels cannot be quantified
      • Interval: Data collected can be ordered and added or subtracted, but not divided or multiplied
      • Ratio: Data collected has all properties of the interval scale and can be multiplied and divided
    • Types of Data:
      • Primary Data: Acquired directly from the source
      • Secondary Data: Second-hand data
    • Methods of Data Presentation:
      • 1.Textual Method: Presenting data using paragraphs
      • 2.Tabular Method: Organizing raw data in table form
      • 2.Tabular Method: Organizing raw data in table form3.Graphical Method: Visual representation aiding in understanding data at a glance understanding data at a glance
    • Numerical Descriptive Measures:
      • Measures of Location: Values within the data set describing their location or position relative to the entire data set
      • Measure of Central Tendency:
      • Mean: Sum of all values divided by the number of values
      • Median: Value dividing the data set into two equal parts
      • Mode: Most frequently occurring value
    • Dividing data into quartiles:
      • Q1 = 8, Q2 = 12, Q3 = 16
    • Range (R) is the difference between the greatest and least data value
    • Variance (ơ2) takes into account the spread of all data points in a data set
    • Coefficient of Variation (CV) is a percentile expression of the mean
    • Measures of Skewness:
      • SK>0 → Positively Skewed
      • SK<0 → Negatively Skewed
      • SK=0 → Normal/Bell-shaped
    • Measure of Kurtosis:
      • K>0 → Leptokurtic
      • K<0 → Platykurtic
      • K=0 → Normal/Mesokurtic
    • Inferential Statistics: Concerned with making objective generalizations about a larger data set based on information from a part of it
    • Standard Deviation (ơ) (s) or √𝒔 𝟐 - it is the set of numerical data that uses of amount which is the individual data value deviates from the mean. Formula: S=√𝒔 𝟐
    • Coefficient of Variation (CV) - Percentile expression of the mean -the higher the CV, the higher the variation (directly proportional) - the CV of highly precise analyzers can be lower than 1%
    • PERCENTILES – Divide the array of data into 100 equal parts.
    • DECILES – divide the array into 10 equal parts.
    • QUARTILES – Divide the array of data into 4 equal parts
    • PROPERTIES OF THE MEAN - Unique - Amenable to algebraic manipulations - Not defined for qualitative data - Easily affected by extreme values
    • PROPERTIES OF THE MEDIAN - Unique - Not amenable to algebraic manipulations - Applicable only to variables for which ordering of values is possible - Not affected by extreme values
    • PROPERTIES OF THE MODE - Not affected by extreme values - Defined for both quantitative and qualitative data - Not unique
    • NOMINAL (descriptive) EXAMPLE:
      • Gender, Personal Preference, Nationality, Personality Type, Continent, Hair Color
    • ORDINAL (descriptive)EXAMPLE:
      • Educational Level, Socio Economic Status, Income Level
    • INTERVAL (inferential)EXAMPLE:
      • temperature in Celsius,
    • RATIO (inferential)EXAMPLE:
      • Weight, height, Temperature in kelvin, length of time
    • LEVELS OF MEASUREMENT - Characteristic of the data collected as a consequence of the operational definition used in the data collection. - Dictate the type of mathematical operations which can be performed. - Determine the appropriate statistical methods to be done.
    • NOMINAL (descriptive) - Data collected are simply labels or names or categories w/o any explicit ordering of labels. - Observation with the same label belong to the same category. - Frequencies or counts of observations belonging to the same category can be obtained.
    • ORDINAL (descriptive)
      Data collected are labels or classes with an implied ordering of these labels.
      Ranking can be done on the data.
      • Distance between 2 labels cannot be quantified.
    • INTERVAL (inferential) - Data collected can be ordered, and in addition, may be added or subtracted, but not divided nor multiplied. - Zero point is arbitrary. - Indicates an actual amount (numerical). The order and the difference between the variables can be known. Its limitation is it has no “True Zero”
    • RATIO (inferential) - Data collected has all the properties of the interval scale, and in addition, can be multiplied and divided. - Zero point is absolute. - It has the same properties as the interval level. The order and difference can be described. Additionally, it has a true zero and the ratio between two points has a meaning.
    • Statistics as a science -deals with the collection, organization, presentation, analysis, and interpretation of data. -is a study of variation. -is a study of making objective or fair conclusions/generalizations about a larger set of data based on information from a part.
    • Statistic as a number -is a collection of facts and figures -is processed data -is information
    See similar decks