buma descriptive statistical measures

Cards (45)

  • A population includes all of the entities of interest in a study. 
  • A sample is a subset of the population, often randomly chosen and preferably representative of the population as a whole. 
  • POPULATION
    • The measurable quality is called a parameter
    • The population is a complete set.
    • Reports are a true representation of opinion.
    • It contains all members of a specified group. 
  • SAMPLE
    • The measurable quality is called a statistic.
    • The sample is a subset of the population.
    • Reports have a margin of error and confidence interval.
    • It is a subset that represents the entire population.
  • A data set is generally a rectangular array of data where the columns contain variables, such as height, gender, and income, and each row contains an observation.
  •  A variable (column) is often called a field or an attribute,
  • observation (row) is often called a case or a record.
  •  the most common arrangement by far is to have variables in columns, with variable names in the top row, and observations in the remaining rows. 
  • A variable (or field or attribute) is a characteristic of members of a population, such as height, gender, or salary.
  • An observation (or case or record) is a list of all variable values for a single member of a population. 
  • There are several ways to categorize data. A basic distinction is between numerical and categorical data.
  • The Opinion variable is less obvious. It is expressed numerically, on a 1-to-5 scale.
  • QUANTITATIVE DATA- can be expressed as a number or can be quantified.
  • QUALITATIVE DATA- can’t be expressed as a number and can’t be measured
  •  Qualitative data consist of words, pictures, and symbols, not numbers.
  • NOMINAL DATA- used for labeling variables without any type of quantitative value. The name ‘nominal’ comes from the latin word “nomen” which means ‘name’.
  •  the nominal data could just be called ‘labels’
  • ORDINAL DATA - placed into some kind of order by their position on a scale. 
  • Ordinal data may indicate superiority.
  • DISCRETE DATA- is a count that involves only integers. The discrete values cannot be subdivided into parts.
  • CONTINUOUS DATA- could be meaningfully divided into finer levels. It can be measured on a scale or continuum and can have any numeric value. 
  • Descriptive statistics used to analyse data for a single categorical variable include frequencies, percentages, fractions and/or relative frequencies (which are simply frequencies divided by the sample size) obtained from the variable's frequency distribution table.
  • There are three common measures of central tendency, all of which try to answer the basic question of which value is most “typical.” These are the mean, the median, and the mode.
  • The mean is the average of all values.
  • If the data set represents a sample from some larger population, this measure is called the sample mean
  • f the data set represents the entire population, it is called the population mean and is denoted by μ (the Greek letter mu). 
  • The most widely used measure of central tendency is the mean, or arithmetic average. It is the sum of all the scores in a distribution divided by the number of cases.
  • The median is the middle observation when the data are sorted from smallest to largest.
  • The mode is the value that appears most often,
  • mode is not very interesting because it is often the result of a few lucky ties. 
  • The mode is the value in a distribution that occurs most frequently. I
  • The mode is the least useful indicator of central value in a distribution
  • A distribution is symmetrical when the two halves are mirror images of each other.
  • In a symmetrical distribution, the values of the mean and the median coincide.
  • If a distribution is not symmetrical, it is described as skewed, pulled out to one end or the other by the presence of extreme scores.
  • In skewed distributions, the values of the measures of central tendency differ.
  • Skews are labeled according to where the extreme scores lie.
  • skews are labeled according to where the extreme scores lie. A way to remember this is “The tail names the beast.”
  • RANGE- largest value - smallest value
  • Standard Deviation- A Statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.