Statistics is concerned with the collection, organization, summarization, and analysis of data
Statistics involves drawing inferences about a body of data when only a part of the data is observed
The purpose of statistics is to investigate and evaluate the nature of numbers and the meaning of obtained information
Sources of data:
Documented data: primary data documented by the primary source and secondary data documented by a secondary source
Survey: method of collecting data by asking people questions
Experiments: method of collecting data with direct human intervention on conditions
Observation: method of collecting data by recording observations
Other sources: internal data, registration, computer simulations
Biostatistics is the application of statistical tools and concepts in the biological sciences and medicine
Variables:
A variable is a characteristic that takes on different values in different persons, places, or things
Quantitative variables can be measured and convey information regarding amount
Qualitative variables cannot be measured and convey information regarding attribute
Random variable:
A random variable is a result of change factors and cannot be exactly predicted in advance
Discrete random variable: characterized by gaps or interruptions in the values it can assume
Continuous random variable: does not possess gaps or interruptions, can assume any value within a specified interval
Population: a collection of all units from which data is collected
Sample: a subset or representative part of the population
Measurement: assigning a number to a characteristic being measured
Scales of measurement:
Nominal scale: possesses only the property of identity
Ordinal scale: possesses identity and order but not equality of scale
Interval scale: possesses identity, order, and equality of scale but not absolute zero
Ratio scale: possesses all properties of identity, order, equality of scale, and absolute zero
Sampling methods:
Probability sampling: every element in the population has a non-zero chance of being chosen
Simple random sampling: all possible subsets have the same chances of selection
Systematic random sampling: selection of the first element is random and subsequent elements are taken at regular intervals
Stratified sampling: dividing the population into nonoverlapping subpopulations and selecting samples from each stratum
Proportional allocation method assigns equal probabilities for all elements by allocating them proportionately to the sizes of the strata
Cluster sampling divides the population into nonoverlapping groups or clusters, selects a sample of clusters, and includes all elements in the selected clusters
Two-stage sampling identifies elements in the sample at the second stage, while three-stage sampling identifies elements at the third stage
Multistage sampling is a natural extension of one-stage cluster sampling and is more cost-efficient when clusters are large and elements are homogeneous
Nonprobability sampling methods do not use randomization and allow researchers to subjectively choose sampling units
Haphazard or convenience sampling includes elements that are most accessible or easiest to contact, based solely on convenience
Judgment or purposive sampling selects respondents based on the judgment or opinion of the researcher, leading to personal biases and exclusion of other units
Quota sampling subdivides the population into subgroups, determines a quota for each stratum, and fills the quota using convenience or judgment sampling
Snowball sampling starts with initial samples taken by SRS and expands through referrals, often done through social networks
Advantages of nonprobability sampling include convenience and cost-effectiveness, but limitations include lack of representativeness, bias, and inability to determine sampling error
Observation leads to the formulation of questions or uncertainties that can be answered scientifically
Hypotheses are formulated to explain observations and make quantitative predictions of new observations, often generated after extensive background research and literature reviews
Criteria for designing an experiment include accuracy and precision, where accuracy refers to the correctness of a measurement and precision refers to the consistency of a measurement