Lab 16: Geometry in Higher Dimensions¶
One of the learning objectives of Math 314 is to expand your mathematical horizons into thinking about things in more than one variable. Thus far in Math 314 we have restricted ourselves to thinking about things in either 2-dimensional or 3-dimensional space. This is helpful, and natural because we live in a 3-dimensional world, but as dimensionality increases, some of our intuition about what “should” be true falls short. In this lab we explore some fun facts about higher dimensional space.
Fact 1¶
The volume of the unit ball gets smaller as the dimension of the space gets bigger.
First, a brief notational aside. When we measure things in 1-D we usually return a length, in 2-D we return an area, and in 3-D we return a volume. The idea of using a different name when measuring for each dimension is unsustainable, so we will use the generic word “volume”. We also talk about “balls” in \(n\)-dimensional space. Although in 3-D this looks like the ball you are familiar with we usually define a ball as all the points within a certain distance of a central point. When we talk about unit balls we are talking about all the points that are distance \(1\) or less from (typically) the origin.
Our intuition may lead us to believe that as dimensionality increases, the volume of a unit ball should also increase. We have good reason for that looking at the following table.
Dimension |
Unit Ball |
Volume |
---|---|---|
1 |
\([-1,1]\) |
\(2\) |
2 |
\(\{ (x,y) : x^2+y^2 \leq 1 \}\) |
\(\pi\) |
3 |
\(\{ (x,y,z) : x^2+y^2+z^2 \leq 1 \}\) |
\(\frac 43\pi\) |
Surprisingly this trend does not continue as \(n\) increases.
Task 1¶
Check to see the volume of the unit ball actually decreases as \(n\) gets larger by using your code from Task 1 in the Monte Carlo integration lab.
Write a function, plot_ball_volumes()
, that creates a plot with dimension on the x
-axis and the volume on the y
-axis. Let the dimension of the space vary from n = 1
to n = 20
. Remember to label your plot and axes. Make sure to use a large enough number of samples N
.
For what value of n
is the volume of the unit sphere the highest?
Fact 2¶
In higher dimensions most of the volume is concentrated near the boundary.
You may have noticed when doing the prior problem, that as n
got larger the percentage of points inside the cube that was also inside the sphere decreased.
As it turns out, in higher dimensions most of the volume is concentrated at the boundary.
There are a few different exercises one can do to get intuition as to why this is true.
First, consider an n
-dimensional unit cube (i.e., a cube-like shape in n
-dimensions where each side has length 1
).
Suppose you were asked to draw a cube centered inside that n
-dimensional cube that contained half of the volume.
What would be the length of the side of that cube? For a few values of n
we have the following table.
Dimension |
Side Length |
Approximate Value |
---|---|---|
1 |
\(1/2\) |
\(0.5\) |
2 |
\(1/\sqrt{2}\) |
\(0.7071\) |
3 |
\(1/\sqrt[3]{2}\) |
\(0.7938\) |
4 |
\(1/\sqrt[4]{2}\) |
\(0.8409\) |
5 |
\(1/\sqrt[5]{2}\) |
\(0.8706\) |
Task 2¶
Write a function, plot_side_lengths()
that plots the following: for \(n=1, \ldots, 30\) plot the dimension on the \(x\)-axis and the side length of the \(n\)-dimensional cube that has exactly half the volume of a unit \(n\)-dimensional cube.
Fact 3¶
The volume of the unit ball is concentrated at the equator.
For the previous fact we considered taking volume out of the center of the cube. For this problem we are going to consider what happens if we slice a sphere into thin circles. Our next interesting fact is that the volume of a unit ball is concentrated at the equator. If you think about this truth in lower dimensions (i.e., 2 or 3 dimensions) it does not seem like it could possibly be true, however if you think about the ideas from Fact 2, a slice from the center of the sphere has the most boundary.
As you may have noticed from Facts 1 and 2, drawing points inside a sphere is hard to do efficiently, so for this fact we will consider points on the surface of a sphere. First, we will write some code that generates points uniformly on a n
-dimensional sphere.
Task 3a¶
Write a function, generate(n, d)
, that uniformly samples n
points on the surface of a d
-dimensional sphere. Do this by first sampling n
points from [-1, 1]^d
then normalizing them with respect to the 2
-norm (i.e. divide each point by its Euclidean magnitude).
Hints:
You can sample
n
points from[-1,1]^d
usingx = np.random.uniform(-1, 1, (d, n))
.You can normalize
x
by usingx /= np.linalg.norm(x, ord=2, axis=0)
.Return
x.T
for the proper dimensions,(n, d)
.
Task 3b¶
Write a function, equators(d, m, n, tol)
, that uses your generate function to sample m
points from the surface of the d
-dimensional unit sphere, then samples n
additional points from the surface of the d
-dimensional unit sphere, then computes the equatorial region of width tol
for each of the n
sampled points (where each of the n
points is treated like a pole), and returns both the proportion of the m
sampled points that fall within at least one equatorial band and the proportion of the m
points the fall within all of the equatorial bands. Make sure that your function returns two objects of type float
and not np.float32
or np.float64
(consider wrapping your results with float()
).
Fact 4¶
Vectors drawn from high dimensional Gaussians are almost orthogonal.
For this final interesting fact, we will take a (very) brief aside into statistics. One of the most useful distributions in probability and statistics is the normal distribution, sometimes referred to as the Gaussian distribution, or even the bell curve. This distribution describes a lot of occurrences in both the physical and social sciences, for example distribution of heights, or distribution of wealth. The multidimensional probability distribution function of the normal distribution is given by
where \(\mu\) is a vector of the means, \(\Sigma\) is the covariance matrix and \(C\) is a constant dependent on \(\Sigma\) that guarantees the probability distribution function integrates to one.

As one can see from the figure the most probable values are those located close to the mean. A natural generalization of this idea is that if we were to independently draw multiple variables, each from a Gaussian distribution the resulting vector would be close to the overall mean and two vectors drawn in this way would be very close to each other. In fact, the opposite happens. When the number of independent variables are high vectors drawn from a high dimensional Gaussian are nearly orthogonal. Let’s see this by coding.
Task 4a¶
Write a function, gaussian_angles(d, n, tol)
, that samples n
vectors from the d
-dimensional, standard normal distribution (using np.random.normal(size=(n, d))
) then samples an additional vector from the d
-dimensional, standard normal distribution, x
, and returns an array of angles (in degrees) between each sample and x
. Use np.linalg.norm
when computing the angles.
Task 4b¶
Using your code from the previous exercise, write a function, gaussian_orthogonality(d, n, tol)
, that computes the proportion of the angles found by gaussian_angles(d, n)
that fall within a given angle tolerance, tol
, of being orthogonal. In other words, find the proportion of angles that satisfy |theta - 90| < tol
where theta
is a given angle.
Task 4c¶
Use your code from the previous exercises to write a function, plot_gaussian_angles(d, n)
, that creates a histogram for the angles generated by gaussian_angles(d, n)
. Make sure to label your plot and its axes and selected a proper bin width. Use density=True
for your histogram so your plot resembles a probability distribution.