Quantitative method-data analysis tool

Cards (38)

  • Parametric tests assume that the data follows a particular distribution e.g for t-tests, ANOVA and regression, the data needs to be normally distributed. are more powerful than non-parametric tests, when the assumptions about the distribution of the data are true
  • Plotting a histogram or QQ plot of the variable of interest will give an indication of the shape of the distribution. It should peak in the middle and be approximately symmetrical about the mean. If data is normally distributed, the points in QQ plots will be close to the line.
  • There are statistical tests for normality such as the Shapiro-Wilk and Kolmogorov Smirnoff
  • Sample data is used to choose between two choices i.e. hypotheses or statements about a population.
  • Hypothesis testing is an objective method of making decisions or inferences from sample data(evidence).
  • NULL HYPOTHESIS is a statement about the population & sample data used to decide whether to reject that statement or not. Typically the statement is that there is no difference between groups or association between variables
  • ALTERNATIVE HYPOTHESIS is often the research question and varies depending on whether the test is one or two tailed.
  • SIGNIFICANCE LEVEL: The probability of rejecting the null hypothesis when it is true, (also known as a type 1 error). This is decided by the individual but is normally set at 5%(0.05) which means that there is a 1 in 20 chance of rejecting the null hypothesis when it is true.
  • TEST STATISTIC is a value calculated from a sample to decide whether to accept or reject the null (H0) and varies between tests. compares differences between the samples or between observed and expected values when the null hypothesis is true
  • P-value is the probability of obtaining a test statistic at least as extreme as ours if the null is true and there really is no difference or association in the population of interest. This is also calculated using different probability distributions depending on the test. A significant result is when the p-value is less than the chosen level of significance(usually0.05).
  • Independent Test is used to compare the means of two independent groups. Independent groups means that different people are in each group.
  • The Mann-Whitney test is used to compare whether two groups containing different people are the same or not. Ranks all of the data and then compares the sum of the ranks for each group to determine whether the groups are the same or not.
  • There are two types of Mann-Whitney U tests. If the distribution of scores for both groups have the same shape, the medians can be compared. If not, use the default test which compares the mean ranks
  • A paired samples t-test can only be used when the data is paired or matched. Either there are before/after measurements of the same variable or the t-test canbeused to compare how a group of subjects perform under two different test conditions. The test assesses whether the mean of the paired differences is zero.
  • The Wilcoxon signed rank test is used to compare two related samples, matched samples or repeated measurements on a single sample to assess whether their population mean ranks differ. It is a paired difference test and is the non-parametric alternative to the paired t-test.
  • One way ANOVA is used to detect the difference in means of 3 or more independent groups. It can be thought of as an extension of the t-test for 3 or more independent groups.
  • Kruskal-Wallis compares the medians of two or more samples to determine if the samples have come from different populations. It is an extension of the Mann–Whitney U test to 3 or more groups. The distributions do not have to be normal and the variances do not have to be equal.
  • One way ANOVA with repeated measures (within subject) Tests the equality of means in 3 or more groups. All sample members characteristics must be measured under multiple conditions i.e. the dependent variable is repeated. Standard ANOVA cannot be used as the assumption of independence has been violated
  • The Friedman test is used to detect differences in scores across multiple occasions or conditions. The scores for each subject are ranked and then the sums of the ranks for each condition are used to calculate a test statistic. It can also be used when subjects have ranked a list e.g. rank these pictures in order of preference.
  • Two way ANOVA is used to comparing means for combinations of two independent categorical variables (factors).
  • The chi-squared test- The null hypothesis is that there is no relationship/association between the two categorical variables. It compares expected frequencies, assuming the null is true, with the observed frequencies from the study. When obtaining a significant chi-squared result, calculate percentages in a table to summarise where the differences between the groups are.
  • Correlation ( r ) is used to measure the strength of association between two variables and ranges between -1 (perfect negative correlation) to 1 (perfect positive correlation.
  • Pearson’s correlation coefficient is the most common measure of correlation. rho (ρ) =population correlation and r = sample correlation.
  • Spearman’s rank correlation coefficient is a non-parametric statistical measure of the strength of a monotonic relationship between paired data.
  • Kendall’s tau rank correlation coefficient is used to measure the association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient. Specifically, it is a measure of rank correlation, i.e. the similarity of the orderings of the data when ranked by each of the quantities.
  • parametric test - what to check for normality
  • Independent test- dependent variables by group
  • Paired t-test - paired diffrences
  • One way anova- Residuals (the difference between the predicted value of your data and actual value of your data
  • Repeated measures anova- residuals at each point
  • Pearson’s correlation coefficient - both variables are normally distributed
  • Simple linear regression- residuals or error
  • VARIABLES
    SCALE - measurement/count
    CATEGORICAL - ticks boxes on questionnaire
  • Scale is Continuous such as height
    and Discrete e.g. no. of children
  • Categorical is Ordinal obvious order such as likert scale and Nominal no meaningful order such as gender
  • SCALE
    Normally distributed- mean (sd)
    skewed data- median (interquartile range)
  • CATEGORICAL
    Ordinal- median interquartile range
    Nominal- mode (none)
  • DEPENDENT<SCALE
    Normally distributed- Parametric Test
    skewed Data- Non-parametric
    DEPENDENT<CATEGORICAL
    Ordinal- Non-parametric
    Nominal- chi-squared test