BIOSTATISTICS

Cards (124)

  • Data are collections of observations, such as measurements, or survey responses. (A single data value is called a datum, a term rarely used. The term "data" is plural so it is correct to say "data are..." not "data is...")
  • Statistics is the science of planning studies and experiments; obtaining data; and organizing, summarizing, presenting, analyzing, and interpreting those data and then drawing conclusions based on them.
  • population is the complete collection of all measurements or data that are being considered. Typically, the population is the complete collection of data that we would like to make inferences about.
  • census is the collection of data from every member of the population.
  • sample is a subcollection of members selected from a population.
  • Data collection is an integral part of research in any field of studies such as physical and social sciences, humanities, mathematics, and business. The goal for all data collection is to catch valuable evidence that empowers analysis that will lead to the conceptualization of sound and valid answers to the questions of scientific inquiry.
  • This process consists of "prepare, analyze, and conclude.
  • voluntary response sample (or self-selected sample) is one in which respondents themselves decide whether to be included.
  • Statistical significance is achieved in a study when we get
    a result that is very unlikely to occur by chance. A common criterion is that we have statistical significance if the likelihood of an event occurring by chance is 5% or less.
  • Practical significance is possible that some treatment or finding is effective, but common sense might suggest that the treatment or finding does not make enough of a difference to justify its use or to be practical.
  • Misleading Conclusions
    When forming a conclusion based on a statistical analysis, we should make statements that are clear even to those who have no understanding of statistics and its terminology. We should carefully avoid making statements not justified by the statistical analysis.
  • Sample Data Reported Instead of Measured
    When collecting data from people, it is better to take measurements yourself instead of asking subjects to report results.
  • Loaded Questions
    If survey questions are not worded carefully, the results of a study can be misleading.
  • Order of Questions
    Sometimes survey questions are unintentionally loaded by such factors as the order of the items being considered.
  • Nonresponse
    A nonresponse occurs when someone either refuses to respond to a survey question or is unavailable.
  • Percentages
    Some studies cite misleading or unclear percentages. Note that 100% of some quantity is all of it, but if there are references made to percentages that exceed 100%, such references are often not justified.
  • BASIC TYPES OF DATA
    parameter is a numerical measurement describing some characteristic of a population.
    statistic is a numerical measurement describing some characteristic of a sample.
  • QUANTITATIVE/CATEGORICAL
    Quantitative (or numerical) data consist of numbers representing counts or measurements.
    Categorical (or qualitative or attribute) data consist of names or labels (not numbers that represent counts or measurements).
  • Discrete data

    Quantitative data with a finite or "countable" number of values
  • Discrete data

    • Number of tosses of a coin before getting tails
    • Number of births in Houston before getting a male
  • Continuous (numerical) data

    Quantitative data with infinitely many possible values, where the collection of values is not countable
  • Continuous (numerical) data

    • Lengths of distances from 0 cm to 12 cm
    1. NOMINAL LEVEL
    The nominal level of measurement is characterized by data that consist of names, labels, or categories only. It is not possible to arrange the data in some order (such as low to high).
  • ORDINAL LEVEL
             Data are at the ordinal level of measurement if they can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless.
  • INTERVAL LEVEL
             Data are at the interval level of measurement if they can be arranged in order, and differences between data values can be found and are meaningful; but data at this level do not have a natural zero starting point at which none of the quantity is present.
  • RATIO LEVEL
             Data are at the ratio level of measurement if they can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates that none of the quantity is present). For data at this level, differences and ratios are both meaningful.
  • Big data refers to data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of big data may require software simultaneously running in parallel on many different computers.
    Data science involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as biology and epidemiology).
  • MISSING DATA
    A data value is missing completely at random if the likelihood of its being missing is independent of its value or any of the other values in the data set. That is, any data value is just as likely to be missing as any other data value.
    A data value is missing not at random if the missing value is related to the reason that it is missing.
  • The Gold Standard: Randomization with placebo/treatment groups is sometimes called the “gold standard” because it is so effective. (A placebo such as sugar pill has no medicinal effect.)
  • In an experiment, we apply some treatment and then proceed to observe its effects on the individuals. (The individuals in experiments are called experimental units, and they are often called subjects when they are people.)
    In an observational study, we observe and measure specific characteristics, but we do not attempt to modify the individuals being studied.
  • Design of Experiments
    1. Replication: It is the repetition of an experiment on more than one individual. Good use of replication requires sample sizes that are large enough so that we can see effects of treatments.
    2. Blinding: It is used when the subject doesn’t know whether he or she is receiving a treatment or placebo.
    3. Randomization: It is used when individuals are assigned to different groups through a process of random selection.
     
  • simple random sample of n subiects is selected in such a way that every possible sample of the same size n has the same chance of being chosen. 
  • In systematic sampling, we select some starting point and then select every kth (such as every 50th) element in the population.
  • With convenience sampling, we simply use data that are very easy to get.
  • In stratified sampling, we subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender). Then we draw a sample from each subgroup (or stratum).
  • In cluster sampling, we first divide the population area into sections (or clusters). Then we randomly select some of those clusters and choose all the members from those selected clusters.
  • In a multistage sample design, pollsters select a sample in different stages, and each stage might use different methods of sampling.
  • OBSERVATIONAL STUDIES
    In a cross-sectional study, data are observed, measured, and collected at one point in time, not over a period of time.
    In a retrospective (or case-control) study, data are collected from a past time period by going back in time (through examination of records, interviews, and so on).
    In a prospective (or longitudinal or cohort) study, data are collected in the future from groups that share common factors (such groups are called cohorts).
  • Confounding
    When we can see some effect, but we can't identify the specific factor that caused it
  • Completely Randomized Experimental Design

    Assign subjects to different treatment groups through a process of random selection