For this lab you should submit, on Blackboard, your .Rmd
and .html
-files at the end of the lab hour.
This problem uses a dataset from a study on depression and coffee consumption in women (see OIS exercise 6.48). Let's load the dataset:
study <- read.csv("http://www.math.csi.cuny.edu/~tobiasljohnson/214/coffee.csv")
Let's take a closer look:
str(study)
## 'data.frame': 50739 obs. of 2 variables:
## $ depression: int 0 0 0 0 0 0 0 0 0 0 ...
## $ coffee : Factor w/ 5 levels "<=1 cup/week",..: 3 5 3 5 3 4 3 1 3 3 ...
head(study)
## depression coffee
## 1 0 1 cup/day
## 2 0 2-6 cups/week
## 3 0 1 cup/day
## 4 0 2-6 cups/week
## 5 0 1 cup/day
## 6 0 2-3 cups/day
The study started with 50,739 women with no symptoms of depression in
1996. They were sampled randomly from the population of U.S. women who
had never experienced clinical depression. The researchers then
collected information on coffee consumption and the development of
depression over the next ten years. The variable depression
codes
whether the women experienced clinical depression, with 0
meaning no
and 1
meaning yes. The variable coffee
gives each woman's average
intake of coffee.
Your assignment is to try to determine if coffee drinking has any association with depression. Use a 5% significance level.
1)
What test will you use? State hypotheses for it.
2)
Check the conditions for your test.
3)
Carry out the test. Give the p-value and explain what it means. State whether you reject the null hypothesis or not.
4)
Is this an experiment or an observational study? Can you make any conclusions about causality from this study?
5)
If you have extra time, investigate with tables or plots whether there's
an association between coffee drinking and depression and what direction
that association is. (If you make any plots, you might find it helpful
to convert the depression
variable from numerical to categorical using
the factor
command. See Lab
2,
for example.)