Welcome to MTH 513

We will be covering Machine Learning, using a competition based approach to the topic. The class will be divided into teams and participate in competitions on Kaggle.

The course meets Mondays and Wednesdays, 14.30 - 16.25, in 1S-110.

Your instructor is Prof. Mikael Vejdemo-Johansson. On this website you will be able to find lecture notes for the lectures given during the course. Most of the course will be committed to lab time working on the competitions, with lectures an occasional interspersed activity.

We will use An Introduction to Statistical Learning as a reference book. It is available for free as an ebook, and through the library you can find it on SpringerLink, where a printed copy can be ordered for $25.

Contact

Mikael.VejdemoJohansson@csi.cuny.edu
1S-208
Office hours Monday, Wednesday, 12.45 - 14.15.

Competitions

Example solutions

michiexile kernels

Course Blog

The course is primarily graded on writing blog posts describing, in detail, the process and the solution for one of the team's competition submissions.

The blog is here on Medium.

A blog post should include a narrative exposition of the entire process for its competition. In particular it should include

  • What models were tried? What did they score? Why were models or model components discarded?
  • What auxiliary information (exploratory plots etc) were used to decide?
  • What data transformations were tried? What determined which ones to keep and which ones to discard?
  • What was the final pipeline? What did it score?

To create your submission, first make sure you have a Medium account. Next, send your username to the professor so that you can be included in the blog writing crew. Finally write up your blog post and when you are done, use the [...] menu at the bottom to submit it to the course blog publication.

The blog post is due at midnight following the final exam on May 15. Feedback will be given if a draft is shown before May 8.

Final Exam

The final exam is on May 15. For the final, any handwritten notes you bring are allowed, but no other documentation.

Lecture slides

Lecture Content
1 2019-01-28 Welcome, basic concepts of Machine Learning, our first Kaggle submission
2 2019-01-30 Decision boundaries, sklearn pipelines, kNN classifier
3 2019-02-04 Validation, regression
4 2019-02-06 Cross-validation, bootstrap
5 2019-02-11 PCA
6 2019-02-13 Grid Search
7 2019-02-20 Decision Trees, Random Forests, Bagging, Boosting
8 2019-02-25 Stacking
9 2019-03-06 Multi-class Classifiers
10 2019-03-11 Multi-class Classifiers
11 2019-03-13 Kernels
12 2019-03-20 SVM Classifiers
13 2019-03-25 Neural Networks, Tensorflow, Keras
14 2019-03-27 CNNs, Backpropagation
15 2019-04-03 Regularization, Pooling, Dropout for NNs
16 2019-04-08 Optimizers
17 2019-04-10 Ethics
18 2019-04-15 Bayesian Statistics
19 2019-04-17 Bayesian Classifiers
20 2019-04-29 word2vec and autoencoders
21 2019-05-01 Recurrent neural networks
22 2019-05-06 Reinforcement Learning

Exam Review

Some exam review questions can be found here