Fifth lecture task

Fifth lecture task

Use whichever platform you prefer to work on – we will give references and hints for R and for Python.

Do one of the following tasks:

Task

Implement either agglomerative or divisive clustering. Run your algorithm on the iris dataset.

Task

Load the first 1000 points from the MNIST handwritten digits dataset. (if you are brave, work with the entire 60k in this data file)

The first variable is the true digit, variables 2 through 785 are pixel values, row by row.

Do 10-means clustering on the data. Can you identify a unique true digit for each cluster?

Do hierarchical clustering on the data. Do you see a large gap somewhere (indicating well-separated clusters)? How many clusters do you find?