For this lab you should submit, on Blackboard, your .Rmd and .html-files at the end of the lab hour.

This problem uses a dataset from a study on depression and coffee consumption in women (see OIS exercise 6.48). Let's load the dataset:

study <- read.csv("http://www.math.csi.cuny.edu/~tobiasljohnson/214/coffee.csv")

Let's take a closer look:

str(study)

## 'data.frame':    50739 obs. of  2 variables:
##  $ depression: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ coffee    : Factor w/ 5 levels "<=1 cup/week",..: 3 5 3 5 3 4 3 1 3 3 ...

head(study)

##   depression        coffee
## 1          0     1 cup/day
## 2          0 2-6 cups/week
## 3          0     1 cup/day
## 4          0 2-6 cups/week
## 5          0     1 cup/day
## 6          0  2-3 cups/day

The study started with 50,739 women with no symptoms of depression in 1996. They were sampled randomly from the population of U.S. women who had never experienced clinical depression. The researchers then collected information on coffee consumption and the development of depression over the next ten years. The variable depression codes whether the women experienced clinical depression, with 0 meaning no and 1 meaning yes. The variable coffee gives each woman's average intake of coffee.

Your assignment is to try to determine if coffee drinking has any association with depression. Use a 5% significance level.

1)

What test will you use? State hypotheses for it.

2)

Check the conditions for your test.

3)

Carry out the test. Give the p-value and explain what it means. State whether you reject the null hypothesis or not.

4)

Is this an experiment or an observational study? Can you make any conclusions about causality from this study?

5)

If you have extra time, investigate with tables or plots whether there's an association between coffee drinking and depression and what direction that association is. (If you make any plots, you might find it helpful to convert the depression variable from numerical to categorical using the factor command. See Lab 2, for example.)