Data collection is concerned with the accurate acquisition of data; although methods may differ depending on the field, the emphasis on ensuring accuracy remains the same
Accurate data collection is essential to ensure the integrity of the research, regardless of the field of study or data presence (quantitative or qualitative)
Data presentation
Method by which people summarize, organize and communicate information using a variety of tools, such as diagrams, distribution charts, histograms and graphs
Coding data
Typically used to give numerical numbers to measurements that do not automatically have numbers assigned to them
Coding data
Typically used to present non-mathematical values as numbers through electronic means
Stem-and-leaf diagrams
Popular types of data representation used to create a visual image of the mathematical information that mathematicians wish to convey, e.g. human age or date of birth
Data coding
The process of converting data into a form that can be analyzed, involving assigning numerical or categorical codes to data items
Types of data coding
Nominal coding
Ordinal coding
Dichotomous coding
Numeric coding
Nominal coding
Assigning labels or categories to data items
Ordinal coding
Assigning categories to data items in a specific order
Dichotomous coding
Assigning a binary code (e.g., 0 or 1) to data items
Numeric coding
Assigning numerical values to data items
Data classification
A diverse process that involves various methods and criteria for sorting data within a database or repository
Examples and application of data classification
Separating customer data based on gender
Identifying and keeping frequently used data in disk/memory cache
Data sorting based on content/file type, size and time on data
Sorting for security reasons by classifying data into restricted, public or private data types
Data collection
The gathering of a set of observations about variables and it is the starting point of research methods
Types of data
Primary data
Secondary data
Primary data
Data collected for the first time and in crude form, always collected from the source
Methods of collecting primary data
Direct personal observation
Indirect oral interviews
Mailed questionnaire method
Schedule method
From local agents
Secondary data
Second-hand information that has already been collected, generally used when the time of enquiry is short and the accuracy of the enquiry can be compromised to some extent
Categories of collecting secondary data
Published sources
Unpublished sources
Population
The entire group that you want to draw conclusions about
Sample
The specific group that you will collect data from, with a size less than the total size of the population
Unbiased
When the average of a large set of unbiased measurements will be close to the true value
Precise
When repeated measurements will be close to one another, but not necessarily close to the true value
An estimate of a parameter taken from a random sample is known to be unbiased, and as the sample size increases, it gets more precise
Data presentation and display
Involves more than just drawing graphs, and includes understanding the type of data, the intended audience, and how the information will be used
Decisions should not be made based on graphs alone, as no graph can tell you everything you need to know
The purpose of presenting data graphically
To provide information to assist in decision making and to monitor activities in progress
Ways of displaying or presenting data
Stem and leaf
Time sequence plot
Control chart
Lag plot
Scatter plot
Digidot plot
Dot plot
Histogram
Boxplot
Outlier
An extremely high or extremely low data point relative to the nearest data point and the rest of the neighboring co-existing values in a data graph or dataset
Outlier-related concepts
Interquartile range
Determining outliers
Strong outliers
Weak outliers
Descriptive statistics are sensitive to outliers, which is why it is important to check for them
Descriptive statistics
Summary statistics that quantitatively describe or summarize features of a collection of information
Classification
The process of arranging things in groups or classes according to their resemblances and affinities, and gives expression to the unity of attributes that may subsist amongst a diversity of individuals
Characteristics of classification
Equal interval
Quantile
Equal interval classification
The classification scheme divides the range of attribute values into equal-sized sub-ranges, allowing you to specify the number of intervals while it determines where the breaks should be
Quantile classification
Each class contains an equal number of features, and is well-suited to linearly distributed data
Finding the number in a data set where 20% of values fall below it and 80% fall above
1. Order the data from smallest to largest
2. Count the number of observations
3. Convert the percentage to a decimal
4. Insert the values into the formula: ITH OBSERVATION = Q (N + 1)
Natural breaks (Jenks) classification
Classes are based on natural groupings inherent in data, identifying break points by picking the class breaks that best group similar values and maximize the differences between classes