Semester Schedule

We meet Tuesdays 9.30-11.30 in GC:4422.

I expect you to read before each lecture, and work on assigned tasks after each lecture.

Reading plan

This is a preliminary reading schedule. We will work with Hastie-Tibshirani-Friedman: The Elements of Statistical Learning (available from Amazon or directly from Tibshirani’s homepage) and we will supplement with papers and texts from other sources.

The overall flow of the course will go through distinct phases:

  1. What can we do to find structure in an unknown dataset?
    Exploratory data analysis, Visualization, Association rules, Clustering.
  2. How do we know that what we did makes sense?
    Model selection, Bias/Variance tradeoff, Cross-validation, Bootstrap
  3. What can we do with the results?
    Classification methods (Random forests, Logistic regression, Support vector machines, Discriminant analysis)
Day Lecture Content Reading Tasks
Jan 30 1 Exploratory Data Analysis and Visualization HTF Chapter 1. The NIST Handbook on EDA Skip alphabetic list of graphical techniques; refresh your knowledge on univariate, 1-factor and regression plots; skim chapter 3 – keep it in mind as reference First lecture task
Feb 6 2 Association Rules HTF Chapter 14.1-2 Second lecture task
Feb 13 3 Distances; data as a metric space; intro to clustering A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data; HTF Chapter 14.3.1-5 Third lecture task
Feb 27 4 k-means; k-medioids; nearest-neighbors; vector quantization HTF Chapter 13, 14.3.6-11 Fourth lecture task
Mar 6 5 Hierarchical clustering, Topological Data Analysis HTF Chapter 14.3.12 Fifth lecture task
Mar 13 6 Models, model selection, tradeoffs HTF Chapter 7.1-3, 7.10-11 No homework this week
Mar 20 7 SOM and PCA HTF Chapter 14.4-6 Seventh lecture task
Mar 27 8 ICA, MDS and non-linear dimension reduction HTF Chapter 14.7-9 Eighth lecture task
Apr 10 9 Linear Classification HTF Chapter 4 Ninth lecture task
Apr 17 10 Support Vector Machines HTF Chapter 12.1-2 No homework this week
Apr 24 11 Random Forests HTF Chapter 15 Eleventh lecture task
May 1 12 Neural Networks and Tensorflow No homework this week
May 8 13 Report deadline; Presentations No homework this week
May 15 14 Presentations