For this lab you should submit, on Blackboard, your .Rmd
and .docx
-files at the end of the lab hour.
The dataset mpg
contains car models with a range of features including engine volume, cylinder count, drive type, and mileages for city and highway driving.
Two-way and N-way tables
Commands to construct and modify two-way tables were given in the lecture slides from the last lecture. Using these,
Task Create a 3-way table describing drv
, fl
and class
.
Task Create a margin table that retains drv
and class
.
Task Create a proportional table with conditional proportions of drv
conditioned on class
.
Plotting gallery
We will explore a selection of possible plotting types. For these plots, we will be using a dataset that provides us with paired data numeric-numeric, numeric-categorical and categorical-categorical.
Simple plotting interface: ggformula
We have been using ggformula
for plotting. The basic structure of a ggformula command is
command(response ~ predictor | splitter, data=dataset)
Additional features can be connected to data by using color = ~ variable
, fill = ~ variable
, size = ~ variable
, shape = ~ variable
, ...
Fine-tuning details: ggplot2
The library ggformula
is built on top of the plotting library ggplot2
.
Where ggformula
uses %>%
to layer plots on top of each other, ggplot2
uses +
.
Since ggformula
is built on top of ggplot2
, any ggformula
plot can be tweaked using ggplot2
commands.
To start a ggplot2
plot, we use the ggplot
command. Either ggplot
or later commands take arguments data
to provide a dataset and mapping
to provide an Aesthetic Mapping.
Most often you will want to give both dataset and aesthetic mapping already in the ggplot
command -- that way they are already set and ready for every subsequent component you add to the plot.
Task Try it out by running:
ggplot(mpg)
Describe the result of this command.
Aesthetic Mappings
ggplot2
builds fundamentally on connecting aspects of data to aspects describing a plot.
The way to connect data to aspects of the plot is through aesthetic mappings.
These are produced using the command aes
, taking as parameters the actual properties.
Most commonly used properties include x
, y
, color
, fill
, shape
, size
.
Task Let's add some aesthetic mappings too, by running
ggplot(mpg, aes(x=cty, y=hwy, color=class, shape=drv))
Describe the result of this command.
Adding geometry
The ggplot
command, with or without dataset and aesthetic mapping, will not actually draw anything.
To put shapes on the plot, we need geometries.
All geometry commands start with geom_
, and take different aesthetic mappings depending on which geometry you are using.
You can find out the aesthetic mappings by looking at the help file for the geometry in question.
Task Let's produce a scatter plot.
Use the ggplot
command composed in the previous task, and add geom_point()
.
Describe the resulting plot, and how each of the aesthetic mappings has influenced the plot itself.
Some interesting geometries to use include:
ggformula Command |
ggplot2 Command |
Effect |
---|---|---|
Single variable plots | ||
gf_bar |
geom_bar |
Bar chart (will count entries) |
gf_col |
geom_col |
Bar chart (will use provided values) |
gf_boxplot |
geom_boxplot |
Box plot |
gf_density |
geom_density |
Density estimate (smooth histogram) |
gf_dotplot |
geom_dotplot |
Dot plot |
gf_freqpoly |
geom_freqpoly |
Frequency curve |
gf_histogram |
geom_histogram |
Histogram |
gf_qq |
geom_qq |
Quantile plot |
gf_qqline |
geom_qq_line |
Quantile plot guide line |
gf_rug , gf_rugx , gf_rugy |
geom_rug |
Rug plot (markers at the bottom for each data point; combine with histogram) |
gf_violin |
geom_violin |
Violin plot (boxplot with full density distribution graph) |
Two variable plots | ||
gf_point |
geom_point |
Scatter plot |
gf_count |
geom_count |
Scatter plot with points scaled by co-occurring values |
gf_jitter |
geom_jitter |
Scatter plot with randomly displaced points |
gf_bin2d |
geom_bin2d |
Heatmap (square bins) |
gf_hex |
geom_hex |
Heatmap (hexagonal bins) |
gf_density_2d |
geom_density_2d |
2d density estimate (smooth heatmap) |
gf_line |
geom_line |
Line plot |
gf_smooth |
geom_smooth |
Smoothed curve fitted to scatterplot |
Multiple variable plots | ||
gf_contour |
geom_contour |
Contour plot of 3d surface |
gf_errorbar |
geom_errorbar |
Error bar plot |
gf_crossbar |
geom_crossbar |
Error bar plot |
gf_linerange |
geom_linerange |
Error bar plot |
gf_pointrange |
geom_pointrange |
Error bar plot |
gf_raster |
geom_raster |
Pixel grid |
gf_tile |
geom_tile |
Rectangular grid |
Utility plots | ||
gf_abline |
geom_abline |
Straight line |
gf_hline |
geom_hline |
Horizontal line |
gf_vline |
geom_vline |
Vertical line |
Adapting the plot: scales
Color schemes, scale adapations and other transformations can be done using the scale_
commands. Some of the most useful include
Command | Effect |
---|---|
scale_x_log10 |
X-axis log scale |
scale_y_log10 |
X-axis log scale |
scale_x_sqrt |
X-axis square root transform |
scale_y_sqrt |
X-axis square root transform |
scale_color_viridis_c |
Viridis color scheme (numeric data) |
scale_color_viridis_d |
Viridis color scheme (categorical data) |
scale_fill_viridis_c |
Viridis color scheme (numeric data) |
scale_fill_viridis_d |
Viridis color scheme (categorical data) |
coord_polar |
Polar coordinates |
coord_flip |
Swap x and y axes |
coord_equal |
Fix aspect ratio (circles are round...) |
Tasks
Visualize two variables
Task Produce two different plots that visualize the relationship between the cty
and hwy
variables in the dataset mpg
.
Task Produce two different plots that visualize the distribution of cty
as split into subpopulations by drv
.
To visualize joint distributions of categorical variables, two common methods is using dodged bar charts, or using a colored grid. The colored grid version could look something like this:
ggplot(tally(cyl~drv, data=mpg) %>% as.data.frame(), aes(x=cyl, y=drv, fill=Freq)) +
geom_raster()
Task Produce two different plots that visualize the joint distribution of drv
and class
.
Visualize one variable
Task Produce two different plots that visualize the distribution of cty
.
Task Produce a plot that visualize the distribution of drv
.
Modifying plots
Task Modify at least one of your plots to use Viridis in its continuous version.
Task Modify at least one of your plots to use Viridis in its discrete version.