index
Welcome to MTH 513
We will be covering Machine Learning, using a competition based approach to the topic. The class will be divided into teams and participate in competitions on Kaggle.
The course meets Mondays and Wednesdays, 14.30 - 16.25, in 1S-110.
Your instructor is Prof. Mikael Vejdemo-Johansson. On this website you will be able to find lecture notes for the lectures given during the course. Most of the course will be committed to lab time working on the competitions, with lectures an occasional interspersed activity.
We will use An Introduction to Statistical Learning as a reference book. It is available for free as an ebook, and through the library you can find it on SpringerLink, where a printed copy can be ordered for $25.
Contact
Mikael.VejdemoJohansson@csi.cuny.edu
1S-208
Office hours Monday, Wednesday, 12.45 - 14.15.
Competitions
- Titanic - MVJ got 0.80382
- House price regression - MVJ got 0.50378
- Forest Cover Type classification - MVJ got 0.58489
- Fashion MNIST - MVJ got 0.84466
Example solutions
Course Blog
The course is primarily graded on writing blog posts describing, in detail, the process and the solution for one of the team's competition submissions.
The blog is here on Medium.
A blog post should include a narrative exposition of the entire process for its competition. In particular it should include
- What models were tried? What did they score? Why were models or model components discarded?
- What auxiliary information (exploratory plots etc) were used to decide?
- What data transformations were tried? What determined which ones to keep and which ones to discard?
- What was the final pipeline? What did it score?
To create your submission, first make sure you have a Medium account. Next, send your username to the professor so that you can be included in the blog writing crew. Finally write up your blog post and when you are done, use the [...] menu at the bottom to submit it to the course blog publication.
The blog post is due at midnight following the final exam on May 15. Feedback will be given if a draft is shown before May 8.
Final Exam
The final exam is on May 15. For the final, any handwritten notes you bring are allowed, but no other documentation.
Lecture slides
Lecture | Content |
---|---|
1 | 2019-01-28 Welcome, basic concepts of Machine Learning, our first Kaggle submission |
2 | 2019-01-30 Decision boundaries, sklearn pipelines, kNN classifier |
3 | 2019-02-04 Validation, regression |
4 | 2019-02-06 Cross-validation, bootstrap |
5 | 2019-02-11 PCA |
6 | 2019-02-13 Grid Search |
7 | 2019-02-20 Decision Trees, Random Forests, Bagging, Boosting |
8 | 2019-02-25 Stacking |
9 | 2019-03-06 Multi-class Classifiers |
10 | 2019-03-11 Multi-class Classifiers |
11 | 2019-03-13 Kernels |
12 | 2019-03-20 SVM Classifiers |
13 | 2019-03-25 Neural Networks, Tensorflow, Keras |
14 | 2019-03-27 CNNs, Backpropagation |
15 | 2019-04-03 Regularization, Pooling, Dropout for NNs |
16 | 2019-04-08 Optimizers |
17 | 2019-04-10 Ethics |
18 | 2019-04-15 Bayesian Statistics |
19 | 2019-04-17 Bayesian Classifiers |
20 | 2019-04-29 word2vec and autoencoders |
21 | 2019-05-01 Recurrent neural networks |
22 | 2019-05-06 Reinforcement Learning |
Exam Review
Some exam review questions can be found here