Introduction to Statistics

Cards (53)

  • Statistics 
    • a branch of mathematics that deals with the collection, organization, presentation, analysis and interpretation of data. 
    • “systematic process”
  • Steps in Statistical Investigation
    1. Identification of the problem.
    2. Collection of Data - refers to the different methods and techniques of gathering the data.
    3. Presentation of Data - refers to the tabulation and organization of data in tables, graphs and charts.
    4. Analysis of Data - the process of deriving relevant information from the gathered data through the different statistical tools.
    5. Interpretation of Data - refers to the task of drawing conclusions or inferences from the analyzed data.
  • Population 
    • defined as group of people, animals, places, things of ideas to which any conclusion based on characteristics of sample will be applies
    • totality of all possible values of the variables 
    • Samples - subgroup of population drawn from a population 
  • Descriptive Statistics 
    • deals with describing the important characteristics of a given data.
    • it is being applied once the set of data refers to a population or to a sample.
    • Statistical Tool : 
    • Measures of Tendency (mean, median, mode) 
    • Measures of Variation/Dispersion (standard deviation, variance)
  • Inferential Statistics 
    • deals with drawing conclusions about the population using a sample.
    • Statistical Tool : 
    • T-test, z-test, ANOVA, Pearson-r (Parametric test)
    • Chi-square (Non-parametric test)
  • Data - Refers to any information concerning population or a sample. 
  • Primary Data - refers to information which is gathered directly from the original source. Information that is gathered by an interviewer from an interviewee
  • Secondary Data - refers to information that is gathered by an interviewer from an interviewee. Information that is taken from a newspaper, published or unpublished book, thesis etc
  • Qualitative Data - are classified accdg. to categories or attributes that are distinguished by some nonnumeric characteristics. Examples : gender, eye color, blood type, civil status
  • Quantitative Data - are classified according to numerical characteristics. 
    1. Discrete : the values are obtained by counting (exact) Ex. Number of students, Number of patients
    2. Continuous : the values are obtained by measuring. Ex. Temperature, area, distance, height, weight
  • Scales of Measurement 
    • Measurement : assigned value to a variable 
    • There are four levels of measurements used in preparing data for analysis.
    • Qualitative : Nominal, Ordinal
    • Quantitative : Interval, Ratio 
  • Nominal
    • classifies elements into two or more categories or classes
    • the numbers indicating that the elements are different but not according to order or magnitude
    • Order doesn’t matter 
    • Examples : Gender, hair color, genotype, religion preference, living accommodation 
  • Ordinal 
    • measurements which deal with order or rank
    • provides information about relative comparison but the degrees or differences are not available
    • Examples : Class ranking, social classes, the likert scale, level of satisfaction, educational attainment, time of day 
  • Interval 
    • It is similar to ordinal level data because it has a definite ordering but there is a difference between data.
    • it also establishes a uniform unit in the scale so that any equal distance between two scores is of equal magnitude.
    • There is no absolute zero in this scale. 
    • Examples : Temperature 
  • Ratio 
    • It is a modified interval level to include the starting point “zero”
    • The quality of ratio or proportion is meaningful.
    • There is absolute zero in this scale. 
    • Examples : scores on a test, income earned in a week, number of teachers 
  • Direct or Interview Method - It is a method where there is a person to person interaction. An exchange of ideas between the soliciting information(interviewer) and the one that is supplying the information (interviewee). This method is applicable in a small sample or population size
  • Indirect or Questionnaire Method - A questionnaire is a list of well-planned questions written on paper which can be either personally administered or mailed by the researcher to the respondents
  • Registration Method - documentary analysis wherein data are gathered from fact or information on file. Examples are births, death, license, land lines, company registration etc
  • Observation Method - scientific method or gathering data that makes possible use of all senses to measure or to obtain results from the subject of the study
  • Experiment Method - A method of collection of data wherein effort is made to control the  factors affecting the variable in the question. It examines the Cause and Effect of a certain phenomenon
  • Textual Form - data are presented in paragraph form
  • Tabular Form - data appear in rows and columns
  • Graphical Form - used of pictures to visualized data
  • Frequency Distribution - data are being presented in terms of lists, categories or classes, along with the number of data that fall into each category
  • Measure of central tendency is usually called average. It is a single value that represents a data set. It is the locator of the center of the data set. Average plays an important role in our daily life and it is an important tool in statistics. Mean, median and mode are the three kinds of “averages” or sometimes called measures of central tendency.
  • Why is Central Tendency important?
    •  Let us know what is normal or ‘average’ for a set of data.
    • It condenses the data set down to one representative value, which is useful when you are working with large amounts of data.
    • It allows the comparison of one data set to another, as well as one piece of data to the entire data set.
    • Example: You could easily draw comparisons between the girls’ and boys’ heights by calculating the average heigh for each sample group. 
  • Mean (balance point
    • The mean by definition is the sum of all the values in the observation or a dataset divided by the total number of observations. This is also known as the arithmetic average.
  • Median (physical middle point) 
    • The median of a set of data values is the middle value of the data set when it has been arranged in either ascending or descending order. That is, from the smallest value to the highest value or vice versa.
  • Mode (most frequent) 
    • The mode is a statistical term that refers to the most frequently occurring number found in a set of numbers. The mode is found by collecting and organizing the data in order to count the frequency of each result. The result of the highest occurrences is the mode of the set. In a set of data, there is a possibility of having one mode (unimodal), two modes (bimodal) or many modes (multimodal). If no data is repeated, then there is no mode for the list.
  • A measure of variation is also called measures of dispersion. Level of consistency. It is used to describe the distribution of the data. Answer the questions:
    • How is the data distributed?
    • Is it clustered in one area or is it really spread out?
  • Range 
    • It is the simplest measure of variation.
    • It is greatly affected by extreme values and it is not resistant to change since it only uses the largest (maximum) and smallest (minimum) values.
    • Not a good measure of variability.
  • Mean Absolute Deviation (MAD) 
    • it is the average of how much the data values differ from the mean.
    • Small MAD value indicates clustered data values.
    • A big MAD value indicates spread out data values.
  • Variance 
    • Represented by the Greek letter sigma that is squared or the letter s^2. 
    • It is the square of the deviation of data sets from its mean. It is use to determine how far or clustered a random data points from their average value.
  • Standard Deviation 
    • It is also a reliable measure of dispersion - It is the square root of variance
    • It is used to quantify the amount of variation or dispersion of a set of data values and represented by the Greek letter sigma σ or the Latin letter s.
    • A low standard deviation is an indicator that the data points tend to be close to the mean while a higher value indicates a widespread from the mean. The group with smaller variability is more homogenous than the group with a bigger variability. 
  • Variance measures the dispersion of a set of data points around their mean. 
  • Advantages of Variance and Standard Deviation
    1. Takes into account every value in the data set. 
    2. Most reliable measure of variability. 
    3. Mathematically logical
    4. Amenable to further mathematical manipulations. 
  • Disadvantages of Variance and Standard Deviation
    1. Harder to compute and more difficult to understand. 
    2. Generally affected by extreme values that may/may not skew the data. 
  • When to use the Variance and Standard Deviation?
    • When a dependable measure of variability is needed. 
    • Statistical analysis is needed. 
    • Most widely used measure of variability and easiest to handle algebraically.
  • Measures of Relative Positions - It determines the position of a single value in relation to other values in a sample of the population data set. Also called fractiles. It is a score distribution where the scores are divided into different equal parts.
  • Quartile (k = 1 - 4) - a measure of position that divides the ordered observations or score distribution into 4 equal parts