Chapter 1 covered mathematical theorems from calculus and some definitions:
A an example of a $C^1$ function is $F(x) = \int_a^x f(u) du$ when $f$ is continuous. This example can be re-integrated to get examples of $C^k$ functions.
The intermediate value theorem
the extreme value theorem
the mean value theorem. (We formulated this in an extended sense due to Cauchy.)
$$~ f(x) = T_k(x) + \frac{f^{(k+1)}(\xi)}{(k+1)!} (x - c)^{k+1}, ~$$Let $f$ be a $C^{k+1}$ function on $[a,b]$. Let $a < c < b$ and $x$ be in $(a,b)$. Then there exists an $\xi$ between $c$ and $x$ satisfying:
where $T_k(x) = f(c) + f'(c)(x-c) + \cdots + f^{(k)}(c)/k! \cdot (x-c)^k$.
$$T_k$$is the Taylor polynomial of degree $k$ for $f$ about $c$.
There are other ways to write the remainder term, and somewhat relaxed assumptions on $f$ that are possible, but this is the easiest to remember.
A less precise, but still true, form of the above is:
$$~ f(x) = T_k(x) + \mathcal{O}((x-c)^{k+1}). ~$$There exists a $\xi$ in $[a,b]$ with $f(\xi) = 0$.
There exists a $\xi$ in $[a,b]$ with $f'(\xi) = 0$.
There exists a $\xi$ in $[a,b]$ with $f'(\xi) \cdot (b-a) = f(b) - f(c)$.
What are the assumptions on $f$ used in your proof?
at $c=0$
at $c=0$
at $c=0$
at $c=0$
We needed a very large value of $k$. What if we tried this over a smaller interval, say $0 \leq \xi \leq 1/2$, instead? How big would $k$ need to be then.
We used $f^{(k)}(x) = \pm 1 / (k (1+x)^k)$.
Chapter 2 deals with floating point representation of real numbers. Some basic things we saw along the way:
We saw how the non-negative integers $0, \dots, 2^n-1$ can fit in $n$ bits in a simple manner.
We saw how the integers $-2^{n-1}, \dots, 0, \dots, 2^{n-1}-1$ can fit in $n$ bits using two's complements for the negative numbers. The advantage with this storage is fast addition and subtraction.
The basic storage uses
a sign bit
bits to store the significand which is normalized to be $1.ddd\cdots d$.
some bits to store the exponent, $e_{min} \leq m \leq e_{max}$.
and all this is put together to create the floating point numbers of the form
$$~ \pm 1.ddddd\cdots d \cdot 2^m. ~$$the sign bit comes first and uses 1
for minus, and 0
for plus.
the exponent is stored as an unsigned integer ($0, \cdots 2^k - 1$) and there is an implicit bias to be subtracted. The value $000\cdots 0$ is special and used for $0.0$ (or $-0.0$) and subnormal numbers. The value $111\cdots 1$ is used for Inf
, -Inf
and various types of NaN
.
the significand has an implicit $1$ in front, except for the special numbers 0
, Inf
, and NaN
.
What is $\delta$? Some number between $-\epsilon$ and $\epsilon$. What is $\epsilon$? Good question.
eps
through $\epsilon = 1^+ - 1$, where $1^+$ is the next largest floating than $1$. We saw $\epsilon = 2^{-p}$.We saw that if $x$ and $y$ are real numbers, that the relative error of the floating point result of $x-y$ can be large if $x$ is close to $y$
We saw a theorem that says even if there is no rounding error, the subtraction of $y$ from $x$ can introduce a loss of precision. Basically, if $x$ and $y$ agree to $p$ binary digits, then a shift is necessary of $p$ units. More concretely: if $x > y > 0$ and $1 - y/x \leq 2^{-p}$ then at least $p$ significant binary bits are lost in forming $x-y$.
We saw that if possible we should avoid big numbers, as the errors are then possibly bigger. (Why the book suggests finding $(a+b)/2$ as $a + (b-a)/2$.)
We saw that when possible we should cut down on the operations used. (One reason why Horner's method for polynomial evaluations is preferred.)
We saw that errors can accumulate. In particular we discussed this theorem:
If $x_i$ are positive, the relative error in a nieve summation of $\sum x_i$ is $\mathcal{O}(n\epsilon)$.
evaluation of function when the input is uncertain. That is we evaluate $f(x+h)$ when we want to find $f(x)$. (It could be $x + h = x(1+\delta)$, say. For this we have
Or the relative error in the image is the relative error in the domain times a factor $xf'(x)/f(x)$.
evaluation of a perturbed function (which can happen with polynomials that have rounded coefficients). For this, we have $F(x) = f(x) + \epsilon g(x)$. The example we had is if $r$ is a root of $f$ and $r+h$ is a root of $F$. What can we say about $h$? We can see that
Which can be big. The example in the book uses the Wilkinson polynomial and $r=20$. (The Wilkinson polynomial actually is exactly this case, as there is necessary rounding to get its coefficients into floating point.
If $1 = 1.00 \cdot 10^2$. What is $\epsilon$?
What is $3.14 \cdot 10^0 - 3.15 \cdot 10^0$?
What is $4.00 \cdot 10^0$ times $3.00 \cdot 10^1$?
What is $\delta$ (where $fl(x \cdot y) = (x\cdot y)\cdot (1 \cdot \delta)$) when computing $1.23 \cdot 10^4$ times $4.32 \cdot 10^1$?
How many total numbers are representable in this form ($0$ is not)?
What is $\epsilon$?
what is $1.11 \cdot 2^1 - 1.00 \cdot 2^0$?
Convert the number $-1.01 \cdot 2^{-2}$ to decimal.
Let $x=1.11 \cdot 2^0$ and $y=1.11 \cdot 2^1$. Find $\delta$ in $fl(x \cdot y) = (x \cdot y)(1 + \delta)$.
0101000101000000
. The first bit, 0
is the sign bit, the exponent 10100
and significant 0101000000
). Can you find the number? Remember the exponent is encoded and you'll need to subtract 01111
then convert.E = expm1(x)
is the more precise version of $e^x - 1$:$$~
(1/2) \cdot E + E/(E+1)
~$$
Can you think of why the direct approach might cause issues for some values of $x$ in that range?
.
What value of $k$ will ensure that the error over $[0, 1/4]$ is no more than $10^{-3}$?
That is, floating point multiplication is not associative. You can verify by testing (0.1 * 0.2)*0.3
and 0.1 * (0.2 * 0.3)
.
That is, if you computed the difference quotient, $(f(x+h)-f(x))/h$ in floating would you expect smaller and smaller values of $h$ would converge. Why?
This chapter is about solving for zeros of a real-valued, scalar function $f(x)$.
This came from
Intermediate value theorem. If $f(x)$ is continuous on $[a,b]$, then for any $y$ in the interval between $f(a)$ and $f(b)$, there exists $c$ such that $f(c) = y$.
The special case is when $f(a) \cdot f(b) < 0$ ($[a,b]$ is a bracket), then there is a $c$ where $f(c) = 0$.
A proof follows by subsequently bisecting the interval. If we number $a_0, b_0=a,b$, and set $c_0$ equal to $(a_0 + b_0)/2$. Then either $f(c_0)$ is positive, negative or $0$. If $0$, we can stop. If not, then either $[a_0,c_0]$ or $[c_0,b_0]$ will be a bracket. Call this $[a_1,b_1]$ and define $c_1$ as a new midpoint. We repeat and get a sequence $c_0, c_1, \dots$. If this terminates, we are done. Otherwise, since it can be shown $|c_n - c_{n+k}| \leq 2^{-n}|b_0 - a_0|$, that $c_i$ has a limt $c$. This limit will be the zero. It can't have $f(c)> 0$. The values of $c_i$ where $f(c_i)<0$ will also have limit of $c$ and by continuity $f(c) \leq 0$. (This is provided there is an infinite sequences of $c_i$s with $f(c_i) <0$, which requires proof. Similarly, it can't be $f(c) < 0$. So it must be $0$.
The point of the proof is that there is a bound on the error:
$$~ |c_n - c| \leq \frac{1}{2} |b_n - a_n| = \frac{1}{2^{(n+1)}} |b_0 - a_0|. ~$$We discussed various means to find the midpoint:
using (a + b)/2
using a + (b-a)/2
using the midpoint after reinterpreting the floating point values as
ordered integers
The bound shows issues can occur if a) the initial error is big (poor guess) b) the first derivative is close to 0 (near a min or max) c) the concavity is large (curves too much to be well approximated by a line)
We saw three facts:
For a simple zero, we could find $\delta > 0$ so that convergence
was quadratic. (Find $\delta C(\delta) < 1$.)
For a concave up increasing $C^2$ function, we had guaranteed
convergence
For $f(x) = (g(x))^k$, we had linear convergence
The secant method had convergence between linear and quadratic, but only takes 1 function call per step.
We define a contractive map, as a function $F$ which takes some domain $C$ into itself (a closed set), and satisfies for some $\lambda < 1$: ...
$$~ |F(x) - F(y)| < \lambda |x - y|. ~$$We can define sequence by $x_{n+1} = F(x_n)$. For a contractive map, this sequence will converge; that is $F(x)$ will have a unique fixed point, $s$.
If $q$ is first integer with $F^{(q)}(s) \neq 0$ then $e_{n+1} \approx C e_n^q$.
Let $f(x) = x^2 - 2$. Starting with $a_0, b_0 = 1, 2$, find $a_4, b_4$.
Let $e_n$ be $c_n - c$. The order of convergence of $c_n$ is $q$ provided
Using the bound above, what is the obvious guess for the order of convergence?
Explain why the bisection method is no help in finding the zeros of $f(x) = (x-1)^2 \cdot e^x$.
In floating point, the computation of the midpoint via $(a+b)/2$ is discouraged and using $a + (b-a)/2$ is suggested. Why?
Mathematically if $a < b$, it always the case that there exists a $c = (a+b)/2$ and $a < c < b$. Is this also always the case in floating point? Can you think of an example of when it wouldn't be?
To compute $\pi$ as a solution to $\sin(x) = 0$, one might use the bisection method with $a_0, b_0 = 3,4$. Were you to do so, how many steps would it take to find an error of no more than $10^{-16}$?
A simple zero for a function $f(x)$ is one where $f'(x) \neq 0$. Some algorithms have different convergence properties for functions with only simple zeros as compared to those with non-simple zeros. Would the bisection algorithm have a difference?
If you answered yes above, you could still be right, even though you'd be wrong mathematically (Why? look at the bound on the error and the assumptions on $f$.). This is because for functions with non simple zeros, you can have a lot of numeric issues creep in. The book gives an example of the function lie $f(x) = (x-1)^5$. Explain what is going on with this graph near $x=1$:
using Plots f(x) = x^5 - 5x^4 +10x^3 -10x^2 + 5x -1 plot(f, 0.999, 1.001)
For $f(x) = x^2 - 2$, and $x_0 = 1$ and $x_1 = 1.5$ compute 3 steps
of a) the bisection method, b) Newton's method, c) the secant method
We have $f(x) = \sin(x)$ has $[3,4]$ as a bracketing interval. Give a bound on the error $c_n - r$ after 10 steps of the bisection method.
We have $f(x) = x^2 - s$ has a solution $\sqrt{s}$, $s > 0$. Compute $1/2 \cdot f''(\xi)/f'(x_0)$ for $x_0 = s$. Compute the error, $e_1$.
For $f(x) = \sin(x)$, find an interval $[-\delta, \delta]$ for which the
newton iterates will converge quadratically to $0$.
Newton's method is applied to the function $f(x) = \log(x) - s$ to find $e^s$. If $x_n < e^s$ show $x_{n+1} < e^s$. If $e_0 > 0$ yet $x_1 > 0$, show $e_1 < 0$. (mirror the proof of one of the theorems)
Suppose a student tries the following for Newton's method: $x_{n+1}=f(x_n)/f'(x_n)$. Is this method guaranteed to converge? (Is the mapping clearly contractive?). If by change it did converge, describe what equation the fixed point would satisfy?
Is $F(x) = \sqrt(x)$ contractive over $C=[0,1]$? Show it is or
isn't.
The expression $\sqrt{p + \sqrt{p + \sqrt{p + \cdots}}}$ converges. What does the answer satisfy? (Express as $x_{n+1} = \sqrt{p + x_n}$.)