Detailed class plan

For this class, the largest part of your fact consumption will be through reading the text book. You can expect a workload of approximately 30 pages each week (mean: 29; std.dev.: 6.7; 5-number summary: 20 / 24.25 / 27.5 / 35 / 40), as well as approximately 10 assigned exercises.

The overall structure of our class meetings will be:

Mondays:

I clarify any issues that showed up in your self reflections.
We discuss the points raised for thinking through and discussing in the text.
I introduce additional important or helpful material and perspectives.
Peer-evaluation of homework exercises.

Wednesdays, classroom:

I go through the concepts and computer programming issues relevant for the lab of the day.

Wednesdays, computer lab:

You work through a lab sheet, introducing and teaching specific techniques in RStudio.

You may optionally work in Python/PyLab. I am happy to help – this is the environment I work the most in – but I will not be able to talk about the Python approach in the Wednesday lecture part.

Lab content

Get to know RStudio. Using ggplot2 for graphs. Histograms, scatterplots, jitter, alpha-channel. Reproducibility & random seeding. Dataset loading.
Compute summary statistics. Explore failures of Mean/Variance measures. Plotting PDFs. Computing PDFs, CDFs, correlation coefficients. Dataset summaries.
Experiment design, sampling design Data provenance. N/A representation. N/A handling. -999 as null value. Mrs. Null.
Inference, Ethics, Causation Anscombe quartet.
Randomness, models, random variables
Least squares Computing and examining LSQ fits and summaries
Error types: base rate bias.
Sampling distributions Polling data?
Confidence intervals, significance Data fishing, torture a data set. Polling data.
Proportional inference Exit polls/results.
Means inference
Two-way tables; chi-squared
Linear regression
ANOVA