terms

    Cards (102)

    • Statistics
      The science of collecting, classifying, and interpreting data
    • Observational Study
      Observe a group and measure quantities of interest. This is passive data collection in that one does not attempt to influence the group. The purpose of the study is to describe the group
    • Experiment
      Deliberately impose treatments on groups in order to observe responses. The purpose is to study whether the treatments cause a change in their responses.
    • Population
      Entire group of interest
    • Sample
      A part of the population selected to draw conclusions about the entire population
    • Census
      A sample that attempts to include the entire population
    • Parameter
      A concept that describes the population (mean, median, stdev, variance)
    • Statistic
      A number produced from a sample that estimates a population parameter
    • Experimental Group
      A collection of experimental units subjected to a difference in treatment, imposed by the experimenter
    • Control Group
      A collection of experimental units subjected to the same conditions as those in an experimental group except that no treatment is imposed
    • Confounding Effects

      When you have multiple factors in a study and you can't tell which factor causes a change in the variable of interest
    • Variable
      Any characteristic or quantity to be measured on units in a study
    • Categorical Variable

      Places a unit into one of several categories (examples: gender, race, political party)
    • Quantitative Variable
      Takes on numerical values for which arithmetic makes sense (examples: SAT score, number of siblings, cost of textbooks)
    • Univariate
      Data has one variable
    • Bivariate
      Data has two variables
    • Multivariate
      Data has three or more variables
    • Frequency
      Number of times the value occurs in the data
    • Relative frequency
      Proportion of the data with the value
    • "Typical" Concentration
      Generally characterized by the center of the data (the mean of the sample/population)
    • Spread
      Wide vs. narrow, describes how much "stuff" is under the curve
    • Histogram
      Bar graph of binned or grouped data where the height of the bar above each bin denotes the frequency (relative frequency of values in the bin)
    • How do you choose the number of histogram bins?
      # of bins = sqrt(number of observations)
    • Symmetric data
      Has roughly the same mirror image on each side of a center value
    • Skewed data
      One side (either left or right) which is much longer than the other relative to the mode (peak value).

      It is skewed in the direction that you'd think the skier on a slope would fall.

      Example: hump on left, right-skewed.
    • Multi-modal data

      Data has more than one mode
    • Measures of Central Tendency
      Sample median and sample mean
    • Sample Median
      Middle observation if the values are arranged in increasing order
    • Sample mean
      The average of n observations, aka the sum of the values divided by the number "n" observations.
    • First Quartile (Q1)
      25th percentile of the data
    • Second Quartile (Q2)
      50th percentile of the data (median)
    • Third Quartile (Q3)

      75th percentile of the data
    • 5-number summary

      Min, Q1, Q2, Q3, and Max values
    • Boxplots
      Graphically displaying the 5-number summary
    • Measures of Spread
      Interquartile range, sample variance, and sample standard deviation
    • Interquartile Range
      IQR = Q3 - Q1 = range of the middle 50% of the data
    • Sample Variance
      Sum of squared deviations from the sample mean divided by n-1

      The n-1 term is essentially degrees of freedom.
    • Sample Standard Deviation
      Square root of the variance, where the variance is the sum of squared deviations from the sample mean divided by n-1.
    • Random Process
      A process whose outcome cannot be predicted with certainty
    • Sample Space
      The collection of all possible outcomes to a random process
    See similar decks