Lab 8

Getting Started

A few quick reminders:

  • The console lets you try commands and see the results right away, but nothing you do there is saved.

  • To create the file that you’ll eventually hand in, go to File -> New File -> R Script. Save the file as lab8.R or something like that. In this file, put your answers like this:

    # Exercise 1
    mean(dataset$left.bicep.thickness)
    mean(dataset$right.bicep.thickness)
    # On average, people's right biceps are thicker than their left biceps
    
    # Exercise 2
  • To compile your R file, press Ctrl-Shift-K, or click on the little notebook icon in the toolbar. When it asks for the report output format, choose html.

  • If you load a dataset in the console, this doesn’t make it available in your R file. If you load a dataset in your R file, this doesn’t make it available in the console.

  • To run a line of code from your R file in the console without having to type it in again, put your cursor on the line and press Ctrl-Enter.

  • You can look up a command in help by putting the cursor on it and pressing F1. Or, in the console, enter in a question mark followed by the name of the command, like ?mean.

Goals

Today’s lab is designed to review the three fundamental skills covered in chapter 6 for the upcoming midterm:

  • finding a confidence interval
  • performing a test of significance
  • computing the power of a test to detect a given alternative

We’ll use normal distributions on this lab, even when a t-distribution might be more suitable, since this lab is meant to help study for the midterm and t-distributions aren’t on the midterm. If you’re feeling ambitious, you can go ahead and use t-distributions anyhow.

Exercise 1

We’ll use data from the 2010 General Social Survey, a survey carried out by social scientists each year since 1972. You can download the data at http://www.math.csi.cuny.edu/~maher/teaching/2019/spring/stats/labs/gss.csv

Each observation in the dataset represents an individual who responded to the poll. The variables in the dataset are:

variable definition
sibs number of siblings
mntlhlth number of days out of the last 30 patient experienced depression or other mental health problems
physhlth number of days out of last 30 patient experienced problems with physicial health
age age

Task. Load the dataset into an object called gss.

Exercise 2

You’d like to study mental health and how it relates to family size. The question to be addressed is on the mental health of people who have four or more siblings. The nationwide average for number of days out of the last 30 in which the individual experienced mental health problems is known to be about 3.8. You’d like to know if this is different for the population of people with four or more siblings.

Task. In words, give null and alternative hypotheses for this question.

Exercise 3

Task. How many people in the dataset have four or more siblings? For this group, what is the mean number of days in the last 30 in which the individual had mental health problems? What is the standard deviation of this number of days?

Exercise 4

Recall that the power of test to detect a given alternative value is the probability of rejecting the null hypothesis in favor of the alternative hypothesis, assuming that the true population parameter is the given one.

Task. Compute the power of the test to detect an alternative value of 4.0 for the mean of the mntlhlth variable for the population of individuals with four or more siblings.

You can assume that the sampling distribution of the sample mean is normal and that the population standard deviation is the same as the sample standard deviation, here and in future exercises.

Exercise 5

Task. In words, comment on the power of the test. Do you have enough data to have a good chance of detecting the effect you’re looking for?

Exercise 6

Task. Regardless of your answer in Exercise 5, carry out a test of significance for your hypotheses from Exercise 2. Use a significance level of 0.05. Report the p-value and say what your final conclusion is as to the two hypotheses.

Exercise 7

Compute a 95% confidence interval for the mean of mntlhlth in the population of individuals with four or more siblings.