A rule that assigns a numerical value to each outcome in a sample space
Types of Random Variable
Discrete
Continuous
Discrete Random Variable
Can take only a finite number of distinct values such as 0,1,2,3,4, 5,… and so on. It is a variable whose value is obtained by counting.
Continuous Random Variable
The set of its possible values is uncountable; It is a variable whose value is obtained by measuring.
Plotting Single Continuous Random Variable
Histogram
Density Plot
Box-Whisker Plot
Use the syntax "str()" to check the structure of the dataset
Histogram
A graphical display of data using bars of different heights. It is similar to a Bar Chart, but a histogram groups numbers into ranges. Histograms are a great way to show results of continuous data.
When the data is in categories (such as Ranks or Sections), we should use a Bar Chart instead of a Histogram
Creating a Histogram
1. hist()
2. Add x-axis label
3. Add y-axis label
4. Add title
5. Adjust number of bins
Density Plot
Visualizes the distribution of data over a continuous interval or time period. This chart is a variation of a Histogram. An advantage Density Plots have over Histograms is that they're better at determining the distribution shape.
Creating a Density Plot
plot(density(variable))
Box Plot (Box and Whisker Plot)
Displays the five-number summary of a set of data: minimum, first quartile, median, third quartile, and maximum.
Creating a Box Plot
1. boxplot(variable)
2. Make it horizontal
Range
The distance between the smallest (minimum) and the largest (maximum) data points
Interquartile Range (IQR)
The distance between the first quartile (Q1 = 25%) and the third quartile (Q3 - 75%) in the data
Bar Graph
Plots numeric values for levels of a categorical feature as bars. Levels are plotted on one chart axis, and values are plotted on the other axis.
Bar Graphs are good when your data is in categories
Horizontal bar graphs are best for nominal variables, vertical bar graphs are best for ordinal variables
Bar graphs are used to measure the changes over a period of time
Creating a Bar Graph
1. table(variable)
2. plot(table(variable))
3. barplot(table(variable))
Scatterplot
A graph of plotted points that show the relationship between two sets of data. The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them.
Scatterplots are used when we have paired numerical data, when there are multiple values of the dependent variable for a unique value of an independent variable, and in determining the relationship between variables
Types of Correlation in Scatterplots
Positive Correlation
Negative Correlation
No Correlation
Creating a Scatterplot
1. plot(x, y)
2. Add axis labels and title
3. Add jitter
4. Add abline (horizontal, vertical, trendline)
5. Change plot character
Pie Chart
A type of graph that represents the data in a circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables.
Pie charts are ideal for categorical data groups since every single slice can show a specific category
Pie charts are best for showing percentages of a whole, while bar graphs are better for comparing different categories of data or tracking changes over time
Creating a Pie Chart
1. pie(values)
2. Add labels and title
Syntax for Histogram is hist().
Syntax for Density Plot is plot().
Syntax for Box Plot is boxplot().
Give me a sample syntax creating more bins to the histogram.