For this lab you should submit, on Blackboard, your .Rmd and .docx-files at the end of the lab hour.

Chosen project dataset

Task First in this lab, load your dataset. Pick two numeric variables.

The lab instructions will use the iris dataset and the variables Petal.Length and Petal.Width. You should use your own dataset and your chosen two variables.

Correlation

Correlation is calculated using the cor function:

cor(Petal.Length ~ Petal.Width, data=iris)

Task Calculate the correlation between your two variables.

Regression lines and layering plots

A scatterplot can be produced using gf_point:

gf_point(Petal.Length ~ Petal.Width, data=iris)

We can layer several different plots or plot components on top of each other using the %>% pipe operator. Subsequent plot commands in a sequence of pipes do not need to specify any parameters that were already used in previous plot commands.

A regression line is plotted using gf_smooth:

gf_point(Petal.Length ~ Petal.Width, data=iris) %>%
  gf_smooth(method="lm")

Task Produce a scatterplot with a regression line for your two variables.

Summaries and the importance of plotting

Task Load the dataset from http://www.math.csi.cuny.edu/~mvj/MTH214/static/ds.csv

For this part of the lab it is important to perform the tasks in the given order.

In tidyverse there are tools for grouping data and summarizing the groups separately. This allows us to eg write

ds %>% group_by(d) %>% summarise(mean(x), mean(y), sd(x), sd(y), cor(x,y))

This produces a new dataset that has one row for each value of the variable dataset in ds and one column each for mean(x), mean(y), sd(x), sd(y), cor(x,y). Recall that the entire linear regression model is completely determined by these five values.

Task Do this for the dataset you loaded.

Task Discuss the resulting data: what can you see in the results? Do you believe the datasets included to be similar?

Task Do scatter plots of each dataset separately. You can use the formula y~x|d to split each value of d into its own sub plot.

Task Discuss the resulting plots. Compare them to your thoughts based on the summary statistics.

Play the Correlation Game

Task For the remainder of the lab time, as well as at home: play Guess the Correlation