For this lab you should submit, on Blackboard, your .Rmd and .docx-files at the end of the lab hour.

One sample tests and confidence intervals

(Exercise 8.39) John Kerrich, while a prisoner of war during WW2, tossed a coin 10 000 times and obtained 5067 heads.

Task Is the binomial test appropriate for testing for proportions using this sample?

Task Is the normal approximation test appropriate for testing for proportions using this sample?

Task Using an appropriate method, produce a 95% confidence interval for the proportion of heads from flipping a coin.

Task Using an appropriate method, test the null hypothesis John Kerrich used a fair coin against a two-tailed alternative hypothesis.


When preparing for this lab, I rolled a 6-sided die 10 times. My results were: 2, 1, 1, 5, 1, 6, 6, 5, 1, 6.

Task Count the number of 1s. These will be the successes -- non-1s will be failures.

Task Is the binomial test appropriate for testing for proportions using this sample?

Task Is the normal approximation test appropriate for testing for proportions using this sample?

Task Using an appropriate method, produce a 95% confidence interval for the proportion of 1s with this die.

Task Using an appropriate method, test the null hypothesis The die is fair against a two-tailed alternative hypothesis.


How to create a matrix of numbers, or a contingency table, from given numbers: using the numbers in the next task below, we can create matrices and contingency tables as appropriate for oddsRatio or relrisk or stats::prop.test in several different ways.

Writing in a matrix by hand.

The help file for prop.test asks for:

a vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively.

So the success and failure counts need to be in columns. This is the default for the command matrix.

server.tip = matrix(c(40,69-40,130,349-130), ncol = 2)

Task Print out this matrix, to see where each entry falls.

The help file for oddsRatio asks for:

"Successes" should be located in column 1 of x, and the treatment of interest should be located in row 2.

Here, treatment corresponds to wearing a red shirt, so the requirement is instead for a differently shaped matrix

server.tip.OR = matrix(c(130,349-130,40,69-40), byrow = TRUE, ncol = 2)

Task Print out this matrix, to see where each entry falls.

Using data frames and xtabs

Instead of writing out the matrix by hand, we can work with a data frame and the command xtabs to create a contingency table:

server.tip.df = data.frame(
  color = c("red","red","other","other"),
  tip = c(TRUE,FALSE,TRUE,FALSE),
  count = c(40, 69-40, 130, 349-130)
)
server.tip.xt = xtabs(count ~ color+tip, data=server.tip.df)

Task Print out this matrix, to see where each entry falls.

Transposing

You may have to transpose or rearrange the matrix to fit with the requirements for prop.test and oddsRatio respectively. The function t transposes for you.

Task Print out both server.tip.xt and t(server.tip.xt) to see how they differ.


A study examined the relationship between color of the server's shirt and whether or not the customer tipped. The study had 69 served by a server with a red shirt, 349 by a server with a different color. 40 of the 69 tipped, 130 of the 349 tipped.

Task Is the two sample proportions test appropriate?

Task Produce a 95% confidence interval for the difference in proportions of tippers.

Task Test the null hypothesis The same proportion tips when served by a red shirt or a non-red shirt.

Task Calculate the odds ratio or the relative risk. Write out an interpretation of the result.


Task Load the dataset faithful of eruptions of the Old Faithful geyser in Yellowstone National Park using the command data(faithful). The data set contains faithful$eruptions - the time of each eruption, and faithful$waiting - the time until the next eruption starts.

Task Test the alternative hypothesis The median waiting time to the next eruption is greater than 70 minutes against the null hypothesis Median is equal to 70 minutes.

Do not use Kruskal-Wallis for one sample median testing