EBVM

Created by

Cat H

Cards (51)

A census examines a population as a whole whilst a survey examines some of the group information is required from (study population), to then draw conclusions about the population overall; especially when access to the entire target population is not available.
A target population is the entire group of interest for research
A study population is the sample units able to be accessed for survey
A sampling unit is the thing being sampled e.g. cows
A sampling frame is the list of all sample units e.g. taken from cattle movement records
When we conduct a survey we can either choose which individuals or sampling units to include — a non probability sample, where probability of selection varies— or we can randomly sample - probability sampling. In probability sampling, the process of selection is deliberately unbiased and each individual in the study population has the same probability of being selected.
Non-random sampling may bias results because The surveyor or person providing sampling units may unconsciously chose a uniform ‘average’ which will sway the results to make them unrepresentative.
An elementary unit is the basic unit of observation or analysis in a population from which data is collected. It could be a person, object, household, company, or any other individual entity that is part of the study population.
A stratum is a subgroup or layer within a population that shares certain characteristics. In stratified sampling, the population is divided into different strata based on a specific variable (e.g., age, income level, or geographic region). Each stratum is treated as a separate population, and samples are drawn from each one.
The sampling fraction refers to the ratio of the number of units in the sample to the total number of units in the population. It indicates the proportion of the population that has been selected for the sample. Uses the equation sampling fraction = (sample size/ population size)
Purposive sampling involves selection by the investigator but they tend to be more representative than convenience sampling. There tends to still be some bias.
Convenience sampling involves selection by the investigator based on conveniently available test subjects. This does not produce a good representation of the population.
In simple random sampling a sampling frame is drawn up and a random subset of individuals is selected.
In cluster sampling (where a list of all individuals in unavailable) the sample frame becomes aggregates of individuals in a defined area, from which a random sample is then taken. Less precise than a simple random sample even if the same number of individuals are sampled.
Stratified sampling involves a random sample being taken from groups with specific characteristics of interest (in proportion to their occurrence in the population). More precise than SRS of the same size.
Sample size determination in survey design matters because when we use a sample to estimate a population parameter, the larger the sample the more precisely we will estimate the population parameter. Hence, before we conduct a survey it is important to calculate sample size to ensure that our survey will produce an estimate with adequate precision. A larger sample provides a more reasonable chance to spot differences and therefore has greater statistical POWER
Basic approach to detecting presence of disease:
The assumed minimum prevalence or endemic prevalence of a disease directly influences the sample size needed to detect its presence. The rarer the disease (lower prevalence), the larger the sample size required to reliably detect it, and vice versa. Understanding this relationship is critical for planning effective disease surveillance and testing programs to ensure that the disease presence (or absence) is accurately determined.
Large samples vary less than small samples. Confidence intervals are a range of values in which we are confident that the true population lies (usually at 95%).
we might conduct a big survey of cattle in a country and estimate that the prevalence of bovine virus diarrhoea (BVD) is 12% and the 95% confidence interval is 9.5% –14%. This means that in our survey sample we found that 12% of animals had BVD and that based, on our survey results, we are 95% confident that the prevalence in the population sits between 9.5% and 14%.
In SRS, the estimates (e.g., mean or proportion) are generally more precise, assuming that all units in the population are equally likely to be selected. Since the sample is random and independent, the standard error of the estimate is relatively low. Confidence intervals for estimates derived from SRS tend to be narrower because of the lower standard error, implying more precise estimates.
Cluster sampling often has higher variance and larger standard errors compared to SRS because units within the same cluster tend to be more similar. This reduces the effective sample size in terms of variance reduction. Cluster sampling results in wider confidence intervals because of increased intra-cluster correlation, leading to higher variance and standard error.
scales of measurement for data:
Nominal: AKA categorical. Grouped but unurdered categories e.g eye colour, sex, bred
Ordinal: AKA ranked. Ordered categories e.g. small, medium, large; BCS. Sometimes we use numbers to represent these measurements but it is important to remember that behind the numbers there is a set of categories ‐ not a true numerical scale.
Dicrete counts: whole numbers of things
Continuous: Genuine numerical scaling including fractions (can be subgrouped into INTERVAL date where zero is a value e.g. temperature; or RATIO data where zero is null e.g. weight)
statistical hypothesis testing is needed to make decisions about populations based on samples without being fooled by chance.
Removes the risk of coming to incorrect conclusions due to statistical chance in a test.
A null hypothesis is set up then we compute the chance of seeing the effect seen in trial if the null hypothesis was true (p value)
Type I error - You conclude that there is a difference (effect) when there is not one; i.e., reject a true null hypothesis.
Type II error - You fail to spot a difference (effect) that is really there; i.e., fail to reject a false null hypothesis.
A study with a low Type II error rate ‐ one that is good at spotting effects that are there ‐ is called a powerful study. Three important factors influence power: Sample size: The bigger the sample size the higher the power. Effect size: The bigger the effect (or bigger the difference between populations) the easier it will be to spot and hence the greater the power Individual variation: The more individuals vary, the harder it will be to spot a given population difference or effect against the background variability.
limitations of the chi squared test:
The chi-squared test only detects associations between variables; it does not imply causality.
In observational studies, where variables are not controlled (as in experiments), any association found by the chi-squared test cannot establish a cause-and-effect relationship. Confounding variables may influence the observed relationship.
its limitations—such as its sensitivity to sample size, assumption of independence, inability to adjust for confounding variables, and lack of causal insight—mean that it should be interpreted with caution.
In hypothesis testing we ask ‘is there a difference between two groups?’ whereas in estimation we ask ‘how big is the difference between the two groups?’.
So, in summary, estimation lets us know the magnitude of the difference (with a confidence interval describing our uncertainty).
We can divide studies looking at influence of therapies and risk factors on disease occurrence into two types: observational studies and clinical trials. Observational studies do not change anything in the animals they study. They rely on variation in treatment and risk factor exposures that occur naturally.
Cross-sectional study: a random sample is taken, each animal’s disease status assessed and simultaneously assessed for exposure to risk factor. We can then estimate the prevalence of disease (proportion with the disease) in the at‐risk group and the other group.
The ratio of these two prevalence values is called the prevalence ratio. If it is equal to one, it suggests no association of disease occurrence with the risk factor. If greater than one it suggests that animals exposed to the risk factor are more likely to have disease. We have to be careful about inferring causality from CS studies
Cohort study: random sample of the population, classify individuals into unexposed and exposed to risk factor. An observational study: we don’t intervene. Individuals ‘followed’ for study period and any developing disease is recorded. By observing onset of new disease we can estimate incidence. Cumulative incidence = proportion of healthy animals that become ill during study period. In analysis we compare the cumulative incidence of disease (risk) in the exposed group vs the control group.
In cohort studies, if ratio of disease risk in the exposed group vs control group (relative risk) is greater than one, it suggests exposure is associated with increased disease risk. If it is less than one, it suggests that exposure is associated with decreased disease risk. Although we still cannot use a cohort study to demonstrate rigorously causality, the idea that risk factor exposure precedes disease is more convincing support for the risk factor.
Case-control study: When disease is rare we ask to be informed about cases of disease. We also can solicit details of animals from the same practice that do not have the disease. Then, presence/absence of a putative risk factor in the two groups is compared. If a risk factor is more common in the cases (animals with disease) than in the controls (healthy animals) it suggests an association. We cannot calculate prevalence ratios or relative risks in case control studies. Instead, we calculate an odds ratio. If >1, suggests association between risk factor and disease
In a cross sectional study, if prevalence ratio is greater than one, it is a positive association. E.g. if it were 5 the the prevalence is 5x greater in the exposed group than in the unexposed group.
In a full study, we could also calculate a confidence interval to give a range likely to contain the population value
Relative risk: The ratio of the probability (or risk) of an event (e.g., developing a disease) occurring in the exposed group to the probability of the event occurring in the non-exposed group.
defined by equation: RR= (incidence rate of disease in exposed group / incidence rate in non-exposed group)
Relative risk:
RR = 1: No association between exposure and disease.
RR > 1: Positive association (increased risk with exposure).
RR < 1: Negative association (decreased risk with exposure).
Odds ratio: The ratio of the odds of an event occurring in the exposed group to the odds of the event occurring in the non-exposed group.
OR = ( [no. cases in exposed group / no. non-cases in exposed group ] / [no. cases in non-exposed group / no. non-cases in non-exposed group] )
Prevalence ratio: The ratio of the prevalence of a disease in the exposed group to the prevalence of the disease in the non-exposed group. Used in cross-sectional studies where prevalence (not incidence) of disease is measured at a single point in time.
PR = (prevalence of disease in exposed group / prevalence of disease in non-exposed group)
odds ratios will be very similar to prevalence ratios (used in cross‐sectional studies) and relative risks (used in cohort studies) when the disease is rare. When disease is more common, the odds ratio tends to be further from one than the relative risk or prevalence ratio is. This means that if you look at two studies, one using an odds ratio and the other a relative risk, you cannot directly compare the numerical results.
When you use results from a study to estimate these measures of association you are really calculating the value for the study and then presenting it as an estimate of what the association is in the WHOLE POPULATION of animals. As with any other estimate, it is only the value from the study. It is then necessary to calculate a confidence interval — a range within which we are, say 95%, confident that the population value lies. The bigger the study, the narrower (more precise) the confidence interval.
Even though we see an association in RR, PR or OR of a particular study, this may have been down to chance. A statistical hypothesis test can be used, drawing a P value. If P is low, it means our results would be surprising if there is no difference between the two groups (unexposed and exposed) and so we reject the null hypothesis and decide that there is sufficient evidence of an association.
Measures of association tell us the strength of association between risk factor and disease. Confidence intervals describe our uncertainty about this relationship in the population. And statistical tests help to make sure we are not fooled by chance
While relative risk (RR) or odds ratios (OR) indicate the strength of an association between exposure and disease, they don’t provide information on the impact of that association.
Attributable Risk (AR): Measures the absolute risk difference between exposed and non-exposed groups, indicating the number of cases directly due to the exposure.
Aetiological Fraction (AF): Quantifies the percentage of disease in the exposed group that can be attributed to the exposure, showing the potential preventive benefit.
Interaction: when two (or more) risk factors together have a different effect of disease risk than what we would expect from their individual effects.
Observational studies can estimate association between risk facots and disease but they cannot be sure that a risk factor is the causation.
Confounding refers to a situation in which a third variable, called a confounder, influences both the exposure (or independent variable) and the outcome (or dependent variable) in a study, leading to a distorted or biased association between them.
Confounding can lead to overestimating or underestimating the association between an exposure and an outcome. If confounders are not controlled, the true relationship between exposure and outcome can be masked or exaggerated, leading to misleading conclusions.
Good Study Design:
Randomization: In experimental studies (e.g., randomized controlled trials), randomization is used to equally distribute confounding variables across study groups, minimizing the impact of confounding. However, in observational studies, randomization is not possible.
Matching: Researchers can match participants in different groups based on key confounders (e.g., age, sex) to reduce confounding effects.
Restriction: Restricting the study to certain groups (e.g., only non-smokers) can limit the impact of a known confounder but may reduce generalizability.
Clinical trials manipulate the treatment of individuals to assess the efficacy of the treatment. The key difference between clinical trials and observational studies is this intentional allocation or manipulation of treatment. It means we can formally make decisions about causality and are much less likely to experience problems of confounding.