Third lecture task

Third lecture task

Use whichever platform you prefer to work on – we will give references and hints for R and for Python.

Do one of the following tasks:

Task

What is the most economical way you can store the entire distance matrix for a particular dataset?

Assume no storage overhead, and 16 bit numbers: how large a dataset can you comfortably handle using a 4G memory allocation?

Task

One commonly used kernel is the Gaussian or RBF kernel with positive bandwidth parameter \(\sigma\).

$$ K(x,y) = \exp\left[-\frac{\|x-y\|^2}{2\sigma^2}\right] $$
  • Show that this kernel is a kernel in the sense of the lecture slides.
  • Is this a metric? Prove or provide counterexample.