ScienceofStatistics - deals with the collection, analysis, interpretation, and presentation of data
Validity - “will this study help answer the research question”
Analysis - “what analysis and how should this be interpreted and reported”
Efficiency - “Is the experiment the correct size making best use of resources”
FalseAnalogy - a comparison or analogy that is technically valid but that has little or no practical meaning. Implies that a comparison has been designed to be misleading
Biased labeling - misleading labels on a graph
Biased samples - poor quality sample such as answers to leading questions
Leading question: How short are you
Loaded question: Where do you enjoy drinking beer
Double-barred questions: How satisfied or dissatisfied are you with the pay and work benefits of your current job?
Absolutes in questions: Do you always eat breakfast?
Cognitive Biases - misinterpretation of numbers to due to flawed logic
Data Dredging - looking for patterns of data using brute force methods that try a large number of statistical models until matches are found
Overcomplexity - graph and data visualizations that are too complex to be interpreted by your audience. This may prevent data from being challenged and validated.
Overfitting - testing too many theories against data such that random patterns are sure to be found
Prosecutors fallacy - general term for an invalid interpretation of a valid statistics
Significance - basing analysis on a statistically insignificant number of samples
Tyranny of Averages - term for overuse of averages in statistical analysis and decision making. refers to a situation in which an average is relatively meaningless due to the shape of data distribution
Garbage in - garbage out - observation that processes, procedures, and technologies require meaningful inpt to produce a meaningful result
Statistics - refer to procedures and techniques used in the collection
Descriptive statistics - concerned with the collection, description, and analysis of a set of data without drawing conclusions or inferences about a larger set
Inferential Statistics - concerned with making predictions or inferences about a larger set of data using only the information gathered from a subset of this larger set
Statistical Theory of Mathematical Statistics - deals with the development and exposition of theories that serve as bases of statistical methods
Dot plot - consists of a number line and dots / points positioned above the number line.
2 ways to summarize data:
Graphing
Usingnumbers - finding an average
Descriptive statistics - organizing and summarizing data
Statistical Inference - uses probability to determine how confident we can be that our conclusions are correct
Effective interpretation of data / Inference - based on good procedures for producing data and thoughtful examination of the data
Probability - mathematical tool used to study randomness
deals with the chance of an event occurring
Karl Pearson - english statistician “Stastics is the grammar of Science”
Ernest Rutherford - if your experiment needs statistics, you ought to have done a better experiment.
Population
Collection of persons, things, objects under study, collection of all the elements under consideration in a statistical study
Sample
Selection in a population, part or subset of the population from which the information is collected
Sampling - Select a portion of the larger population and study that portion to gain information about the population
Statistics
Numbers that represent a property of the sample
Parameter
Numerical characteristics of a population
Data
Collection of observations
Observation
Numerical recording of information on a variable
Measurement
Process of determining the value or label of a particular variable for a particular experiment or sampling unit
Experimental / sampling unit
Individual or object on which a variable is measured
Parameter - numerical characteristic of the whole population that can be estimated by a statistic
Variable - characteristic or measurement that can be determined for each member of a population, characteristic or attribute of persons or objects which can assume different values or labels for different persons or objects under consideration
Data - actual values of the variable, may be words or numbers