For the second report, your work starts by creating a list of interesting hypotheses to study. These should be of the following kinds, to match up with the tests you will be learning.

I want you to make your hypotheses based on the same subset of data you used for the first report. If your dataset is called `dataset`, then you will subset your data with the following code:

``````set.seed(Last4DigitsFromYourStudentIDNumber)
data = subset(data, 1:nrow(data) %in% sample.int(nrow(data), nrow(data)/2))``````

and then base your hypotheses on the values from that dataset.

Notice that even if you suspect two values to be almost the same, the test you would perform is to test for inequality, and then take a failure to reject the null hypothesis as support for the values being similar (but not necessarily equal).

You should make at least 5 different kinds of hypotheses from this list.

• `mean(datasubset\$Variable)` is [not equal to] / [larger than] / [smaller than] `M` for some specific value `M`. Example hypothesis: `mean(beer\$PercentAlcohol) > 5`.
• `mean(datasubset\$Variable1[SomeCondition])` is [not equal to] / [larger than] / [smaller than] `mean(datasubset\$Variable1[SomeOtherCondition])`, where `Variable1` may be the same as `Variable2` – or different. This is a type of hypothesis that can be inspired by plots like `ggplot(datasubset, aes(x=Variable1, color=Variable2)) + geom_freqpoly()` and seeing different values of the (categorical) `Variable2` producing different or similar distributions. Example hypothesis: `mean(beer\$PercentAlcohol[beer\$Brewery=="Sierra Nevada"]) > mean(beer\$PercentAlcohol[beer\$Brewery=="Flying Dog Brewery "])`
• The proportion of some label of a categorical variable is [not equal to] / [larger than] / [smaller than] `P` for some specific value `P`. Example hypothesis: At least 45% of likely voters plan to vote for Trump.
• The proportion of one specific label is [not equal to] / [larger than] / [smaller than] the proportion of another specific label. Example hypothesis: A larger proportion of likely voters plan to vote for Clinton than for Trump.
• There is a relation between the rows and the columns of a two-way table. Example hypothesis: Gender affects the choice of lunch food.
• The frequencies of labels in one particular variable follows a particular distribution. Example hypothesis: this six-sided die is fair: each number has equal probability of occurring.
• There is a linear relationship between a specific pair of variables. Example hypothesis: Calories and alcohol content of beer are related through a linear equation: `PercentAlcohol = b1*Calories + b0` where `b1≠0`.

# Due dates

As a support for formulating hypotheses, you will read an assigned co-students’ report and suggest ideas to them according to the report you read. These ideas are due March 27.

You then need to formulate your hypotheses and register them with me through Blackboard. Your hypotheses are due March 29.

A draft of your report for feedback, as with the first report, is welcome by April 30.

Your report is due through Blackboard timestamped no later than 14.30 on May 15.