Detailed class plan

For this class, the largest part of your fact consumption will be through reading the text book. You can expect a workload of approximately 30 pages each week (mean: 29; std.dev.: 6.7; 5-number summary: 20 / 24.25 / 27.5 / 35 / 40), as well as approximately 10 assigned exercises.

The overall structure of our class meetings will be:

  1. Mondays:
  1. Wednesdays, classroom:
  1. Wednesdays, computer lab:

You may optionally work in Python/PyLab. I am happy to help – this is the environment I work the most in – but I will not be able to talk about the Python approach in the Wednesday lecture part.

Lab content

  1. Get to know RStudio. Using ggplot2 for graphs. Histograms, scatterplots, jitter, alpha-channel. Reproducibility & random seeding. Dataset loading.
  2. Compute summary statistics. Explore failures of Mean/Variance measures. Plotting PDFs. Computing PDFs, CDFs, correlation coefficients. Dataset summaries.
  3. Experiment design, sampling design Data provenance. N/A representation. N/A handling. -999 as null value. Mrs. Null.
  4. Inference, Ethics, Causation Anscombe quartet.
  5. Randomness, models, random variables
  6. Least squares Computing and examining LSQ fits and summaries
  7. Error types: base rate bias.
  8. Sampling distributions Polling data?
  9. Confidence intervals, significance Data fishing, torture a data set. Polling data.
  10. Proportional inference Exit polls/results.
  11. Means inference
  12. Two-way tables; chi-squared
  13. Linear regression
  14. ANOVA