Save
...
Part 4
4.4 Big data
4.4.1 Sampling
Save
Share
Learn
Content
Leaderboard
Learn
Created by
King Mole
Visit profile
Cards (13)
Sampling
Examining a subset of the available data, known as the
sample
, instead of the
entire
data set
Sampling
Used for a wide range of scientific, political and economic purposes for more than
two
centuries
Allows researchers to determine
patterns
and
trends
without examining the entire data set
Sampling
Political
opinion polls
Monitoring
car journeys to study
traffic
congestion
Representative sample
A sample that
accurately
reflects the wider population in terms of
relevant
characteristics
Creating a
representative
sample is incredibly hard, as the sample can contain hidden biases or omissions that will lead to mistaken conclusions
Failure
of opinion polls to predict 2015 UK general election
Polls systematically
overrepresented Labour
voters at the expense of
Conservative
voters
Sampling has the drawback that both the
sample
and the data obtained from the sample are defined at the
start
Limitations
of sampling in opinion polls
Landline-based
polls exclude
younger
voters without landlines
Internet-only polls favour
younger
voters
Pollsters cannot ask questions that are not in the
poll
, even if they realise a key question is
missing
Sampling
in scientific investigation
Allows researchers to examine a tiny subset of the possible
data
due to time and
expense constraints
Genetic
testing by
23andMe
Samples a relatively small number of
genes
known to be associated with certain traits and conditions, to keep
costs down
Sampling a small number of
genes
Means
diseases
associated with
genes
outside the sample, or those whose genetic origin is uncertain, cannot be detected
Analysing
complete DNA sequences
Avoids the problem of limited sampling, but is
computationally
intensive and more
expensive