Data are raw information or facts that become useful information when organized in a meaningful way.
Data Management is concerned with “looking after” and processing data.
Sampling and experimentation: planning and conducting a study
Introduction todata management
Data Management involves the following: looking after field data sheets, checking and correcting the raw data, preparing data for analysis, documenting and archiving the data and meta-data.
Data Management ensures that data for analysis are of high quality so that conclusions are correct.
Good data management allows further use of the data in the future and enables efficient integration of results with other studies.
Good data management leads to improved processing efficiency, improved data quality, and improved meaningfulness of the data.
Census is the procedure of systematically acquiring and recording information about all members of a given population.
Researchers rarely survey the entire population for two reasons: the cost is too high and the population is dynamic in that the individuals making up the population may change over time.
Sample Survey is a sampling method where sampling is a selection of a subset within a population, to yield some knowledge about the population of concern.
The three main advantages of sampling are that the cost is lower, data collection is faster, and since the data set is smaller, it is possible to improve the accuracy and quality of the data.
Experiment is performed when there are some controlled variables (like certain treatment in medicine) and the intention is to study their effect on other observed variables (like health of patients).
One of the main requirements to experiments is the possibility of replication.
Observation study is appropriate when there are no controlled variables and replication is impossible.
This type of study typically uses a survey.
An example of an observation study is one that explores the correlation between smoking and lung cancer.
A good survey must be representative of the population.
Simple Random Sampling (SRS) is an example of probability sampling, where all samples of a given size have an equal probability of being selected and selections are independent.
SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn’t reflect the makeup of the population.
Nonresponse effects may turn any probability design into a nonprobability design if the characteristics of nonresponse are not well understood, since nonresponse effectively modifies each element’s probability of being sampled.
The frame is not subdivided or partitioned in SRS, which makes it relatively easy to estimate the accuracy of results.
Systematic and stratified techniques, such as Systematic Sampling and Stratified Sampling, attempt to overcome this problem by using information about the population to choose a more representative sample.
Investigators may be interested in research questions specific to subgroups of the population.
Probability Sampling is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected.
SRS cannot accommodate the needs of researchers in this situation because it does not provide subsamples of the population.
Nonprobability sampling does not allow the estimation of sampling errors.
Convenience sampling is an example of nonprobability sampling, where customers in a supermarket are asked questions.
The selection of elements in nonprobability sampling is based on criteria other than randomness, which gives rise to exclusion bias.
Nonprobability sampling is any sampling method where some elements of the population have no chance of selection or where the probability of selection can’t be accurately determined.
Information about the relationship between sample and population in nonprobability sampling is limited, making it difficult to extrapolate from the sample to the population.
Stratified sampling addresses this weakness of SRS.
Quota sampling is another example of nonprobability sampling, where judgment is used to select the subjects based on specified proportions.
To use the probabilistic results, a survey always incorporates a chance, such as a random number generator.
Even when the frame is correctly specified, the subjects may choose not to respond or may not be able to respond.
Cluster Sampling is an example of two-stage random sampling: in the first stage a random sample of areas is chosen; in the second stage a random sample of respondents within those areas is selected.
The wording of the question must be neutral; subjects give different answers depending on the phrasing.
Cluster sampling generally increases the variability of sample estimates above that of simple random sampling, depending on how the clusters differ between themselves, as compared with the within-cluster variation.
Systematic sampling helps to spread the sample over the list.
Every 10th sampling is especially useful for efficient sampling from databases.