R Lang

Cards (35)

  • Random Variable
    A rule that assigns a numerical value to each outcome in a sample space
  • Types of Random Variable
    • Discrete
    • Continuous
  • Discrete Random Variable
    Can take only a finite number of distinct values such as 0,1,2,3,4, 5,… and so on. It is a variable whose value is obtained by counting.
  • Continuous Random Variable
    The set of its possible values is uncountable; It is a variable whose value is obtained by measuring.
  • Plotting Single Continuous Random Variable
    • Histogram
    • Density Plot
    • Box-Whisker Plot
  • Use the syntax "str()" to check the structure of the dataset
  • Histogram
    A graphical display of data using bars of different heights. It is similar to a Bar Chart, but a histogram groups numbers into ranges. Histograms are a great way to show results of continuous data.
  • When the data is in categories (such as Ranks or Sections), we should use a Bar Chart instead of a Histogram
  • Creating a Histogram
    1. hist()
    2. Add x-axis label
    3. Add y-axis label
    4. Add title
    5. Adjust number of bins
  • Density Plot
    Visualizes the distribution of data over a continuous interval or time period. This chart is a variation of a Histogram. An advantage Density Plots have over Histograms is that they're better at determining the distribution shape.
  • Creating a Density Plot
    plot(density(variable))
  • Box Plot (Box and Whisker Plot)
    Displays the five-number summary of a set of data: minimum, first quartile, median, third quartile, and maximum.
  • Creating a Box Plot
    1. boxplot(variable)
    2. Make it horizontal
  • Range
    The distance between the smallest (minimum) and the largest (maximum) data points
  • Interquartile Range (IQR)

    The distance between the first quartile (Q1 = 25%) and the third quartile (Q3 - 75%) in the data
  • Bar Graph
    Plots numeric values for levels of a categorical feature as bars. Levels are plotted on one chart axis, and values are plotted on the other axis.
  • Bar Graphs are good when your data is in categories
  • Horizontal bar graphs are best for nominal variables, vertical bar graphs are best for ordinal variables
  • Bar graphs are used to measure the changes over a period of time
  • Creating a Bar Graph
    1. table(variable)
    2. plot(table(variable))
    3. barplot(table(variable))
  • Scatterplot
    A graph of plotted points that show the relationship between two sets of data. The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them.
  • Scatterplots are used when we have paired numerical data, when there are multiple values of the dependent variable for a unique value of an independent variable, and in determining the relationship between variables
  • Types of Correlation in Scatterplots
    • Positive Correlation
    • Negative Correlation
    • No Correlation
  • Creating a Scatterplot
    1. plot(x, y)
    2. Add axis labels and title
    3. Add jitter
    4. Add abline (horizontal, vertical, trendline)
    5. Change plot character
  • Pie Chart
    A type of graph that represents the data in a circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables.
  • Pie charts are ideal for categorical data groups since every single slice can show a specific category
  • Pie charts are best for showing percentages of a whole, while bar graphs are better for comparing different categories of data or tracking changes over time
  • Creating a Pie Chart
    1. pie(values)
    2. Add labels and title
  • Syntax for Histogram is hist().
  • Syntax for Density Plot is plot().
  • Syntax for Box Plot is boxplot().
  • Give me a sample syntax creating more bins to the histogram.
    hist(chickwts$weight, xlab="Weight", ylab= "Frequency", main="Chicken Weights", breaks = 16)
  • Give me a sample syntax creating specific bins to the histogram
    hist(chickwts$weight, xlab="Weight", ylab= "Frequency", main="Chicken Weights", breaks = 100, 200, 300, 400, 500)
  • Give me a sample syntax adding labels on the histogram
    hist(chickwts$weight, xlab="Weight", ylab= "Frequency", main="Chicken Weights", labels = T)
  • Give me a sample syntax adding colors on the histogram
    hist(chickwts$weight, xlab="Weight", ylab= "Frequency", main="Chicken Weights", labels = T, col = c(“Red”, “Green”, “Blue”))