Data management involves organizing, storing, and handling data to ensure accuracy, availability, and security. It enables actionable insights, operational efficiency, competitiveness, and compliance with regulations.
Statistics aids in data management by providing tools to summarize, analyze, and interpret data. It ensures data quality, supports data-driven decisions, and helps extract insights from complex datasets for better management and decision-making.
Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting quantitative or numerical data. It transforms numbers into useful information and has two main categories: descriptive and inferential.
Descriptive Statistics: Involves collecting, organizing, and presenting data to provide a clear overview, focusing on central tendency and variability. It is used to summarize and describe the features of a dataset.
Descriptive Statistics is concerned with describing the characteristics and properties of a group of persons, places, or things of interest.
Inferential Statistics: Goes beyond describing data by making predictions or inferences about a population based on a sample. It includes techniques like hypothesis testing, confidence intervals, and regression analysis to draw conclusions and make decisions with a known level of uncertainty.
Inferential Statistics consists of methods that use sample results to help make decisions or predictions about a population.
Advancements in technology are transforming statistics, enabling effective use of big data. This chapter explores Excel, a popular tool in statistics. Excel is a powerful tool for managing, analyzing, and visualizing data efficiently for statistical applications.
Population – consists of all elements whose characteristics are being studied.
Sample – a portion of the population selected for study
Constant - characteristics of objects that do not vary
Variable
characteristic/conditions that can change
Data
values associated with a variable
Parameter
descriptive of population
A measure of central tendency or position is a single figure which is representative of a generallevelofmagnitudes or values of items in a set of data.
3 most common central tendency:
Mean — refers to the sum of the value of the items divided by the number of items.
Median — the middle value in an ordered set of data.
Mode — the most frequently occurring value in a set of data.
Percentiles are values that divide a set of data into 100 equal parts.
Some Properties of Mean, Median, and Mode
Mean is unique
Mean is affected by extremely high and extremely low values, called the outliers.
Median is used when one must determine whether the values fall into the upper half or lower half of the distribution
Mode can be used when the data is nominal
Median is also a measure of location.
Quartiles – values that divide a set of observations into 4 equal parts.
25% of data falls below Q1
50% of data falls below Q2 - median
75% of data falls below Q3
Deciles – values that divide a set of observations into 10 equal parts.
30% of the data falls below D3
80% of the data falls below D8
ETC.
Percentiles
values that divide a set of observations into 100 equal parts
40% of the data falls below P40
A measure of variability is a value that describes the spread or dispersion of a set of data points.
The range is the difference between the highest and lowest values in a dataset.
The variance is the average of the squared deviation from the mean.
The standard deviation is the square root of the variance.
Correlation analysis is a group of statistical techniques to measure the association between two variables.
Correlation coefficient is a measure of the relative strength of a linear relationship between two numerical variables.
simple linear regression is a fundamental tool in statistics for understanding the relationships between two variables and make predictions based on those relationships
the dependent variable (x) is also known as the response variable is the variable being predicted and estimated
the independent variable (y) is also known as the predictor or explanatory variable is the variable believed to have an impact on the dependent variable
the intercept, denoted by a, is the expected mean value of the dependent variable when the independent variable is set to zero
the slope, b, is average change in the dependent variable for every unit change in the independent variable
the coefficient of determination (r2) is the proportion of the total variation in the dependent variable that is explained or accounted for by the variation in the independent variable
Logic is the study or science of correct reasoning
A proposition (p) or a logic statement which is either true or false but not both simultaneously
a proposition variable represented by lowercase or capital letter in the english alphabet is used to denote an arbitrary proposition with unspecified true value
a proposition that conveys a simple idea with no connecting words and can be represented by only one propositionalvariable is called simple proposition
a compound proposition is a proposition formed by combining two or more simple propositions using some connecting words
a logical operator is a connecting word used to construct compound propositions by combining simple propositions