Second report: inferential statistics

Your second report will test the hypotheses you have registered in Blackboard. The report you write will be eligible for participating in the Undergraduate Class Project Competition, and I encourage you to participate. I will happily help give you feedback on adapting your report for submission.

You will test your hypotheses on the data you didn’t use to generate your hypotheses. You can pick this subset out by

set.seed(Last4DigitsFromYourStudentIDNumber)
datasubset = dataset[-sample(nrow(dataset))[1:(nrow(dataset)/2)],]

Criteria for C

Your report will…

Criteria for D

Up to four tasks with minor errors. For example: the report file does not knit, but errors are relatively easy to fix; report has grammatical or spelling errors; etc.

Criteria for F

Not handing a report in on time. Omitting one of the instructed tasks completely. Handing in a report where any knitting errors prove difficult or time-consuming to correct. Handing in a report where four or more of the criteria for C have minor errors.

Criteria for A

To achieve the grade of A, your report will also …

First report: descriptive statistics

Your first report will describe your dataset in detail, with special focus on a handful of variables in the data.

Criteria for C

Your report will be readable, written in English, without grammatical or spelling errors. It will be submitted as an RMarkdown file, together with all data files needed to knit the report into a finished text. Your RMarkdown file will run on the lab computers without errors.

Your report will describe general information about the dataset, including:

You will include a suggestion of the kinds of questions the data was collected in order to answer. For instance, the Iris data we have looked could be said to have been collected to find methods for determining species for flower specimens.

You will describe the layout of the data:

You will pick a handful of variables for more careful study. You should pick no more than 5 each of numeric, categorical and date/time-like variables. Fewer is fine, and if you don’t have any dates or timestamps, you clearly will not pick any of those.

For each of the picked variables, produce detailed descriptions of their distributions. Include an appropriate plot to describe how the values of the variable distributes. Where appropriate evaluate whether you think variables have a normal distribution. Explain your reasoning.

For each pair of picked variables, describe how they relate to each other. Where appropriate, produce plots, correlations, two-way tables.

Criteria for D

Up to four tasks with minor errors. For example: the report file does not knit, but errors are relatively easy to fix; report has grammatical or spelling errors; etc.

Criteria for F

Not handing a report in on time. Omitting one of the instructed tasks completely. Handing in a report where any knitting errors prove difficult or time-consuming to correct. Handing in a report where four or more of the criteria for C have minor errors.

More criteria may be added if needed.

Criteria for A

Include, fit and comprehensively describe models for all variable relationships. Give plots, coefficients, proportions of explained variances and a written and explained evaluation of the quality of each model.

Include a discussion that compares and contrasts possible presentations for your data: what options were you choosing between for plotting, for choosing measures of spread and center, and why did you pick the ones you used?

Include a critique of the original data collection. Given your understanding of Why the data was collected, does the data support learning about the questions you believe it was collected to answer? What would you like to see included in the dataset to better respond to these questions – leave feasibility aside: what is your wish list for extending the data?