Population: The entire group of individuals or instances about whom we want to draw conclusions.
Sample: A subset of the population that is selected for study. It's often impractical or impossible to study an entire population, so researchers select a sample to make inferences about the population
Variable: A characteristic or attribute that can take different values. For example, height, weight, age, and test scores are variables.
Data: The values or observations of a variable that are collected from a sample or population.
Descriptive Statistics: Methods used to summarize and describe the main features of a data set. Examples include measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation).
Inferential Statistics: Techniques used to make predictions or inferences about a population based on a sample of data.
Mean: The average of a set of values. It is calculated by summing all the values and dividing by the number of observations.
Median: The middle value in a data set when the values are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle values.
Mode: The value that appears most frequently in a data set.
Range: The difference between the maximum and minimum values in a data set
Standard Deviation: A measure of the amount of variation or dispersion in a set of values. It provides an indication of how spread out the values are around the mean.
Normal Distribution: A symmetric, bell-shaped probability distribution.
In a normal distribution, the mean, median, and mode are all equal and located at the center of the distribution.
Hypothesis Testing: A statistical method used to make inferences about a population based on a sample of data. It involves formulating a hypothesis, collecting and analyzing data, and drawing conclusions.
Confidence Interval: A range of values used to estimate the true value of a population parameter. It provides a level of confidence that the true value falls within the interval
Regression: A statistical method used to examine the relationship between two or more variables. It helps in predicting one variable based on the values of others
Histograms - A graphical representation of the distribution of numerical data.
Scatter Plots - Displays the relationship between two variables.
Line Charts - Connects data points with straight lines.
Histograms - It consists of bars representing the frequency of data within intervals or bins.
Scatter Plots - Each point on the plot represents a pair of values for the two variables
Line Charts - Useful for showing trends over time or relationships between two variables.
Bar Charts - Represents categorical data with rectangular bars.
bar charts - The length of each bar corresponds to the quantity it represents.
Pie Charts - Divides a circle into sectors to represent data proportions.
Pie Charts - Useful for showing the composition of a whole.
Descriptive data analysis is the process of summarizing, organizing, and presenting data in a meaningful and informative way.
Descriptive data analysis - Objective: It aims to describe the main features of a dataset, providing a clear and concise summary that facilitates understanding.
Descriptive data analysis Methods - Measures of Central Tendency, Measures of Dispersion, Frequency Distribution, Percentiles, Skewness and Kurtosis:
Mean - The arithmetic average of a set of values
Mean - Properties: Influenced by extreme values (outliers), and it may not be a representative value if the distribution is skewed.
Median - The middle value in a dataset when it is ordered. If there is an even number of observations, the median is the average of the two middle values
Median - Calculation: Arrange data in ascending order and find the middle value.
Mean - Properties - Less sensitive to outliers than the mean; suitable for skewed distributions
Mode - Definition - The value that occurs most frequently in a dataset.
Mode - Properties: A dataset can have no mode, one mode (unimodal), or multiple modes (multimodal).
Mode - Particularly useful for categorical data.
Range - the difference between the maximum and minimum values
Variance - the average of the squared differences from the mean
Standard Deviation - the square root of the variance, representing the average deviation from the mean