Detailed class plan
For this class, the largest part of your fact consumption will be through reading the text book. You can expect a workload of approximately 30 pages each week (mean: 29; std.dev.: 6.7; 5-number summary: 20 / 24.25 / 27.5 / 35 / 40), as well as approximately 10 assigned exercises.
The overall structure of our class meetings will be:
- Mondays:
- I clarify any issues that showed up in your self reflections.
- We discuss the points raised for thinking through and discussing in the text.
- I introduce additional important or helpful material and perspectives.
- Peer-evaluation of homework exercises.
- Wednesdays, classroom:
- I go through the concepts and computer programming issues relevant for the lab of the day.
- Wednesdays, computer lab:
- You work through a lab sheet, introducing and teaching specific techniques in RStudio.
You may optionally work in Python/PyLab. I am happy to help – this is the environment I work the most in – but I will not be able to talk about the Python approach in the Wednesday lecture part.
Lab content
- Get to know RStudio. Using
ggplot2
for graphs. Histograms, scatterplots, jitter, alpha-channel. Reproducibility & random seeding. Dataset loading. - Compute summary statistics. Explore failures of Mean/Variance measures. Plotting PDFs. Computing PDFs, CDFs, correlation coefficients. Dataset summaries.
- Experiment design, sampling design Data provenance.
N/A
representation.N/A
handling.-999
as null value.Mrs. Null
. - Inference, Ethics, Causation Anscombe quartet.
- Randomness, models, random variables
- Least squares Computing and examining LSQ fits and summaries
- Error types: base rate bias.
- Sampling distributions Polling data?
- Confidence intervals, significance Data fishing, torture a data set. Polling data.
- Proportional inference Exit polls/results.
- Means inference
- Two-way tables; chi-squared
- Linear regression
- ANOVA