terms

Cards (102)

  • Statistics
    The science of collecting, classifying, and interpreting data
  • Observational Study
    Observe a group and measure quantities of interest. This is passive data collection in that one does not attempt to influence the group. The purpose of the study is to describe the group
  • Experiment
    Deliberately impose treatments on groups in order to observe responses. The purpose is to study whether the treatments cause a change in their responses.
  • Population
    Entire group of interest
  • Sample
    A part of the population selected to draw conclusions about the entire population
  • Census
    A sample that attempts to include the entire population
  • Parameter
    A concept that describes the population (mean, median, stdev, variance)
  • Statistic
    A number produced from a sample that estimates a population parameter
  • Experimental Group
    A collection of experimental units subjected to a difference in treatment, imposed by the experimenter
  • Control Group
    A collection of experimental units subjected to the same conditions as those in an experimental group except that no treatment is imposed
  • Confounding Effects

    When you have multiple factors in a study and you can't tell which factor causes a change in the variable of interest
  • Variable
    Any characteristic or quantity to be measured on units in a study
  • Categorical Variable

    Places a unit into one of several categories (examples: gender, race, political party)
  • Quantitative Variable
    Takes on numerical values for which arithmetic makes sense (examples: SAT score, number of siblings, cost of textbooks)
  • Univariate
    Data has one variable
  • Bivariate
    Data has two variables
  • Multivariate
    Data has three or more variables
  • Frequency
    Number of times the value occurs in the data
  • Relative frequency
    Proportion of the data with the value
  • "Typical" Concentration
    Generally characterized by the center of the data (the mean of the sample/population)
  • Spread
    Wide vs. narrow, describes how much "stuff" is under the curve
  • Histogram
    Bar graph of binned or grouped data where the height of the bar above each bin denotes the frequency (relative frequency of values in the bin)
  • How do you choose the number of histogram bins?
    # of bins = sqrt(number of observations)
  • Symmetric data
    Has roughly the same mirror image on each side of a center value
  • Skewed data
    One side (either left or right) which is much longer than the other relative to the mode (peak value).

    It is skewed in the direction that you'd think the skier on a slope would fall.

    Example: hump on left, right-skewed.
  • Multi-modal data

    Data has more than one mode
  • Measures of Central Tendency
    Sample median and sample mean
  • Sample Median
    Middle observation if the values are arranged in increasing order
  • Sample mean
    The average of n observations, aka the sum of the values divided by the number "n" observations.
  • First Quartile (Q1)
    25th percentile of the data
  • Second Quartile (Q2)
    50th percentile of the data (median)
  • Third Quartile (Q3)

    75th percentile of the data
  • 5-number summary

    Min, Q1, Q2, Q3, and Max values
  • Boxplots
    Graphically displaying the 5-number summary
  • Measures of Spread
    Interquartile range, sample variance, and sample standard deviation
  • Interquartile Range
    IQR = Q3 - Q1 = range of the middle 50% of the data
  • Sample Variance
    Sum of squared deviations from the sample mean divided by n-1

    The n-1 term is essentially degrees of freedom.
  • Sample Standard Deviation
    Square root of the variance, where the variance is the sum of squared deviations from the sample mean divided by n-1.
  • Random Process
    A process whose outcome cannot be predicted with certainty
  • Sample Space
    The collection of all possible outcomes to a random process