Data - Refers to any information concerning population or a sample.
Primary Data - refers to information which is gathered directly from the original source. Information that is gathered by an interviewer from an interviewee
Secondary Data - refers to information that is gathered by an interviewer from an interviewee. Information that is taken from a newspaper, published or unpublished book, thesis etc
Qualitative Data - are classified accdg. to categories or attributes that are distinguished by some nonnumeric characteristics. Examples : gender, eye color, blood type, civil status
Quantitative Data - are classified according to numerical characteristics.
Discrete : the values are obtained by counting (exact) Ex. Number of students, Number of patients
Continuous : the values are obtained by measuring. Ex. Temperature, area, distance, height, weight
Scales of Measurement
Measurement : assigned value to a variable
There are four levels of measurements used in preparing data for analysis.
Qualitative : Nominal, Ordinal
Quantitative : Interval, Ratio
Nominal
classifies elements into two or more categories or classes
the numbers indicating that the elements are different but not according to order or magnitude
Order doesn’t matter
Examples : Gender, hair color, genotype, religion preference, living accommodation
Ordinal
measurements which deal with order or rank
provides information about relative comparison but the degrees or differences are not available
Examples : Class ranking, social classes, the likert scale, level of satisfaction, educational attainment, time of day
Interval
It is similar to ordinal level data because it has a definite ordering but there is a difference between data.
it also establishes a uniform unit in the scale so that any equal distance between two scores is of equal magnitude.
There is no absolute zero in this scale.
Examples : Temperature
Ratio
It is a modified interval level to include the starting point “zero”
The quality of ratio or proportion is meaningful.
There is absolute zero in this scale.
Examples : scores on a test, income earned in a week, number of teachers
Direct or Interview Method - It is a method where there is a person to person interaction. An exchange of ideas between the soliciting information(interviewer) and the one that is supplying the information (interviewee). This method is applicable in a small sample or population size
Indirect or Questionnaire Method - A questionnaire is a list of well-planned questions written on paper which can be either personally administered or mailed by the researcher to the respondents
Registration Method - documentary analysis wherein data are gathered from fact or information on file. Examples are births, death, license, land lines, company registration etc
Observation Method - scientific method or gathering data that makes possible use of all senses to measure or to obtain results from the subject of the study
Experiment Method - A method of collection of data wherein effort is made to control the factors affecting the variable in the question. It examines the Cause and Effect of a certain phenomenon
Textual Form - data are presented in paragraph form
Tabular Form - data appear in rows and columns
Graphical Form - used of pictures to visualized data
Frequency Distribution - data are being presented in terms of lists, categories or classes, along with the number of data that fall into each category
Measure of central tendency is usually called average. It is a single value that represents a data set. It is the locator of the center of the data set. Average plays an important role in our daily life and it is an important tool in statistics. Mean, median and mode are the three kinds of “averages” or sometimes called measures of central tendency.
Why is Central Tendency important?
Let us know what is normal or ‘average’ for a set of data.
It condenses the data set down to one representative value, which is useful when you are working with large amounts of data.
It allows the comparison of one data set to another, as well as one piece of data to the entire data set.
Example: You could easily draw comparisons between the girls’ and boys’ heights by calculating the average heigh for each sample group.
Mean (balance point)
The mean by definition is the sum of all the values in the observation or a dataset divided by the total number of observations. This is also known as the arithmetic average.
Median (physical middle point)
The median of a set of data values is the middle value of the data set when it has been arranged in either ascending or descending order. That is, from the smallest value to the highest value or vice versa.
Mode (most frequent)
The mode is a statistical term that refers to the most frequently occurring number found in a set of numbers. The mode is found by collecting and organizing the data in order to count the frequency of each result. The result of the highest occurrences is the mode of the set. In a set of data, there is a possibility of having one mode (unimodal), two modes (bimodal) or many modes (multimodal). If no data is repeated, then there is no mode for the list.
A measure of variation is also called measures of dispersion. Level of consistency. It is used to describe the distribution of the data. Answer the questions:
How is the data distributed?
Is it clustered in one area or is it really spread out?
Range
It is the simplest measure of variation.
It is greatly affected by extreme values and it is not resistant to change since it only uses the largest (maximum) and smallest (minimum) values.
Not a good measure of variability.
Mean Absolute Deviation (MAD)
it is the average of how much the data values differ from the mean.
Small MAD value indicates clustered data values.
A big MAD value indicates spread out data values.
Variance
Represented by the Greek letter sigma that is squared or the letter s^2.
It is the square of the deviation of data sets from its mean. It is use to determine how far or clustered a random data points from their average value.
Standard Deviation
It is also a reliable measure of dispersion - It is the square root of variance
It is used to quantify the amount of variation or dispersion of a set of data values and represented by the Greek letter sigma σ or the Latin letter s.
A low standard deviation is an indicator that the data points tend to be close to the mean while a higher value indicates a widespread from the mean. The group with smaller variability is more homogenous than the group with a bigger variability.
Variance measures the dispersion of a set of data points around their mean.
Advantages of Variance and Standard Deviation
Takes into account every value in the data set.
Most reliable measure of variability.
Mathematically logical
Amenable to further mathematical manipulations.
Disadvantages of Variance and Standard Deviation
Harder to compute and more difficult to understand.
Generally affected by extreme values that may/may not skew the data.
When to use the Variance and Standard Deviation?
When a dependable measure of variability is needed.
Statistical analysis is needed.
Most widely used measure of variability and easiest to handle algebraically.
Measures of Relative Positions - It determines the position of a single value in relation to other values in a sample of the population data set. Also called fractiles. It is a score distribution where the scores are divided into different equal parts.
Quartile (k = 1 - 4) - a measure of position that divides the ordered observations or score distribution into 4 equal parts