Formal Definition
A topological space1 is a manifold if it can be equipped with an atlas: a cover2 where each element of the cover is homeomorphic3 to (an open subset of) \(\mathbb{R}^n\).
Intuitive Description
A shape is a manifold if it has the same dimension everywhere and neither self-intersections, nor boundaries, nor other kinds of pathological behavior.
Manifolds are a particular way of defining what it means for a shape to be nice in a theoretical sense
Geometric/Topological machine learning and data science operate under the manifold hypothesis: an assumption that data typically lies on (or near) a relatively low-dimensional manifold embedded in a higher-dimensional ambient space.
Example: Linear Regression assumes data lies near a hyperplane.
If we can shift from ambient dimension to intrinsic dimension it may be a lot easier to work with the data.
For visualization purposes, it would be best to get down to the range of 2-5 actually relevant dimensions, to limit the number of visual channels we need to use.
Dealing with high-dimensional data
Divide into three groups.
Each group picks one (a different one) of:
From a database (collected by van Hateren) of naturally occurring images, draw 3x3 pixel patches at random.
Most such pixel patches will be almost constant - discard those.
D Mumford et al. used PCA to identify a primary circle of high density in this data. Turns out to trace linear gradients in different orientations.
G Carlsson et al. used Topological Data Analysis to study the high-density structure more carefully. They identify a high-density Klein bottle in the data, with a direct correspondence to quadratic gradients (ridges and valleys) in different orientations.
Read https://handsondataviz.org/how-to-lie-with-charts.html