Manifolds are a particular way of defining what it means for a shape to be nice in a theoretical sense
Geometric/Topological machine learning and data science operate under the manifold hypothesis: an assumption that data typically lies on (or near) a relatively low-dimensional manifold embedded in a higher-dimensional ambient space.
Example: Linear Regression assumes data lies near a hyperplane.
If we can shift from ambient dimension to intrinsic dimension it may be a lot easier to work with the data.
For visualization purposes, it would be best to get down to the range of 2-5 actually relevant dimensions, to limit the number of visual channels we need to use.
Dealing with high-dimensional data
Divide into three groups.
Each group picks one (a different one) of:
From a database (collected by van Hateren) of naturally occurring images, draw 3x3 pixel patches at random.
Most such pixel patches will be almost constant - discard those.
D Mumford et al. used PCA to identify a primary circle of high density in this data. Turns out to trace linear gradients in different orientations.
G Carlsson et al. used Topological Data Analysis to study the high-density structure more carefully. They identify a high-density Klein bottle in the data, with a direct correspondence to quadratic gradients (ridges and valleys) in different orientations.
Read https://handsondataviz.org/how-to-lie-with-charts.html