Chapter 1 covered mathematical theorems from calculus and some definitions:
A an example of a $C^1$ function is $F(x) = \int_a^x f(u) du$ when $f$ is continuous. This example can be re-integrated to get examples of $C^k$ functions.
- The intermediate value theorem
- the extreme value theorem
- the mean value theorem. (We formulated this in an extended sense due to Cauchy.)
Let $f$ be a $C^{k+1}$ function on $[a,b]$. Let $a < c < b$ and $x$ be in $(a,b)$. Then there exists an $\xi$ between $c$ and $x$ satisfying:
$$ f(x) = T_k(x) + \frac{f^{(k+1)}(\xi)}{(k+1)!} (x - c)^{k+1}, $$
where $T_k(x) = f(c) + f'(c)(x-c) + \cdots + f^{(k)}(c)/k! \cdot (x-c)^k$.
$T_k$ is the Taylor polynomial of degree $k$ for $f$ about $c$.
There are other ways to write the remainder term, and somewhat relaxed assumptions on $f$ that are possible, but this is the easiest to remember.
A less precise, but still true, form of the above is:
$$ f(x) = T_k(x) + \mathcal{O}((x-c)^{k+1}). $$
- There exists a $\xi$ in $[a,b]$ with $f(\xi) = 0$.
- There exists a $\xi$ in $[a,b]$ with $f'(\xi) = 0$.
- There exists a $\xi$ in $[a,b]$ with $f'(\xi) \cdot (b-a) = f(b) - f(c)$.
$$ \lim_{h \rightarrow 0} \frac{f(x+h) - f(x-h)}{2h} = f'(x). $$
What are the assumptions on $f$ used in your proof?
- $\sin(x)$ at $c=0$
- $\log(1+x)$ at $c=0$
- $1 / (1 + x)$ at $c=0$
- $\arctan(x)$ at $c=0$
- $\sin(x)$
- $e^x$
$$ \frac{f^{(k+1)}(\xi)}{(k+1)!} (x-0)^{k+1} \leq 2^{-53} $$
We needed a very large value of $k$. What if we tried this over a smaller interval, say $0 \leq \xi \leq 1/2$, instead? How big would $k$ need to be then.
We used $f^{(k)}(x) = \pm 1 / (k (1+x)^k)$.
Chapter 2 deals with floating point representation of real numbers. Some basic things we saw along the way:
- We saw how the non-negative integers $0, \dots, 2^n-1$ can fit in $n$ bits in a simple manner.
- We saw how the integers $-2^{n-1}, \dots, 0, \dots, 2^{n-1}-1$ can fit in $n$ bits using two's complements for the negative numbers. The advantage with this storage is fast addition and subtraction.
The basic storage uses
- a sign bit
- $p$ bits to store the significand which is normalized to be $1.ddd\cdots d$.
- some bits to store the exponent, $e_{min} \leq m \leq e_{max}$.
and all this is put together to create the floating point numbers of the form
$$ \pm 1.ddddd\cdots d \cdot 2^m. $$
- the sign bit comes first and uses 1
for minus, and 0
for plus.
- the exponent is stored as an unsigned integer ($0, \cdots 2^k - 1$) and there is an implicit bias to be subtracted. The value $000\cdots 0$ is special and used for $0.0$ (or $-0.0$) and subnormal numbers. The value $111\cdots 1$ is used for Inf
, -Inf
and various types of NaN
.
- the significand has an implicit $1$ in front, except for the special numbers 0
, Inf
, and NaN
.
$$ fl(x) = x(1 + \delta) $$
What is $\delta$? Some number between $-\epsilon$ and $\epsilon$. What is $\epsilon$? Good question.
eps
through $\epsilon = 1^+ - 1$, where $1^+$ is the next largest floating than $1$. We saw $\epsilon = 2^{-p}$.- We saw that if $x$ and $y$ are real numbers, that the relative error of the floating point result of $x-y$ can be large if $x$ is close to $y$
- We saw a theorem that says even if there is no rounding error, the subtraction of $y$ from $x$ can introduce a loss of precision. Basically, if $x$ and $y$ agree to $p$ binary digits, then a shift is necessary of $p$ units. More concretely: if $x > y > 0$ and $1 - y/x \leq 2^{-p}$ then at least $p$ significant binary bits are lost in forming $x-y$.
- We saw that if possible we should avoid big numbers, as the errors are then possibly bigger. (Why the book suggests finding $(a+b)/2$ as $a + (b-a)/2$.)
- We saw that when possible we should cut down on the operations used. (One reason why Horner's method for polynomial evaluations is preferred.)
- We saw that errors can accumulate. In particular we discussed this theorem:
If $x_i$ are positive, the relative error in a nieve summation of $\sum x_i$ is $\mathcal{O}(n\epsilon)$.
- evaluation of function when the input is uncertain. That is we evaluate $f(x+h)$ when we want to find $f(x)$. (It could be $x + h = x(1+\delta)$, say. For this we have
$$ \frac{f(x+h) - f(x)}{f(x)} \approx \frac{x f'(x)}{f(x)} \cdot \frac{h}{x}, $$
Or the relative error in the image is the relative error in the domain times a factor $xf'(x)/f(x)$.
- evaluation of a perturbed function (which can happen with polynomials that have rounded coefficients). For this, we have $F(x) = f(x) + \epsilon g(x)$. The example we had is if $r$ is a root of $f$ and $r+h$ is a root of $F$. What can we say about $h$? We can see that
$$ h \approx -\epsilon g(r)/f'(r) $$
Which can be big. The example in the book uses the Wilkinson polynomial and $r=20$. (The Wilkinson polynomial actually is exactly this case, as there is necessary rounding to get its coefficients into floating point.
- If $1 = 1.00 \cdot 10^2$. What is $\epsilon$?
- What is $3.14 \cdot 10^0 - 3.15 \cdot 10^0$?
- What is $4.00 \cdot 10^0$ times $3.00 \cdot 10^1$?
- What is $\delta$ (where $fl(x \cdot y) = (x\cdot y)\cdot (1 \cdot \delta)$) when computing $1.23 \cdot 10^4$ times $4.32 \cdot 10^1$?
- How many total numbers are representable in this form ($0$ is not)?
- What is $\epsilon$?
- what is $1.11 \cdot 2^1 - 1.00 \cdot 2^0$?
- Convert the number $-1.01 \cdot 2^{-2}$ to decimal.
- Let $x=1.11 \cdot 2^0$ and $y=1.11 \cdot 2^1$. Find $\delta$ in $fl(x \cdot y) = (x \cdot y)(1 + \delta)$.
0101000101000000
. The first bit, 0
is the sign bit, the exponent 10100
and significant 0101000000
). Can you find the number? Remember the exponent is encoded and you'll need to subtract 01111
then convert.E = expm1(x)
is the more precise version of $e^x - 1$:$$ (1/2) \cdot E + E/(E+1) $$
Can you think of why the direct approach might cause issues for some values of $x$ in that range?
- $\log(x) - \log(y)$
- $x^{-3} (\sin(x) - x)$
- $\sin(x) - \tan(x)$.
What value of $k$ will ensure that the error over $[0, 1/4]$ is no more than $10^{-3}$?
$$ fl(fl(xy)\cdot z) \neq fl(x \cdot fl(yz)) $$
That is, floating point multiplication is not associative. You can verify by testing (0.1 * 0.2)*0.3
and 0.1 * (0.2 * 0.3)
.
$$ \lim_{n\rightarrow\infty} \frac{fl(f(x+10^{-n})) + fl(f(x))}{10^{-n}} $$
That is, if you computed the difference quotient, $(f(x+h)-f(x))/h$ in floating would you expect smaller and smaller values of $h$ would converge. Why?
$$ \log(y(x+h)) - \log(y(x)) \approx \frac{y(x+h) - y(x)}{y(x)}. $$
This chapter is about solving for zeros of a real-valued, scalar function $f(x)$.
We only managed to cover the first section on the bisection method. This is related to the
Intermediate value theorem. If $f(x)$ is continuous on $[a,b]$, then for any $y$ in the interval between $f(a)$ and $f(b)$, there exists $c$ such that $f(c) = y$.
The special case is when $f(a) \cdot f(b) < 0$ ($[a,b]$ is a bracket), then there is a $c$ where $f(c) = 0$.
A proof follows by subsequently bisecting the interval. If we number $a_0, b_0=a,b$, and set $c_0$ equal to $(a_0 + b_0)/2$. Then either $f(c_0)$ is positive, negative or $0$. If $0$, we can stop. If not, then either $[a_0,c_0]$ or $[c_0,b_0]$ will be a bracket. Call this $[a_1,b_1]$ and define $c_1$ as a new midpoint. We repeat and get a sequence $c_0, c_1, \dots$. If this terminates, we are done. Otherwise, since it can be shown $|c_n - c_{n+k}| \leq 2^{-n}|b_0 - a_0|$, that $c_i$ has a limt $c$. This limit will be the zero. It can't have $f(c)> 0$. The values of $c_i$ where $f(c_i)<0$ will also have limit of $c$ and by continuity $f(c) \leq 0$. (This is provided there is an infinite sequences of $c_i$s with $f(c_i) <0$, which requires proof. Similarly, it can't be $f(c) < 0$. So it must be $0$.
The point of the proof is that there is a bound on the error:
$$ |c_n - c| \leq \frac{1}{2} |b_n - a_n| \leq \frac{1}{2^{(n+1)}} |b_0 - a_0|. $$
$$ \lim_n \frac{e_{n+1}}{e_n^q} = A $$
Using the bound above, what is the obvious guess for the order of convergence?
using Gadfly f(x) = x^5 - 5x^4 +10x^3 -10x^2 + 5x -1 plot(f, 0.999, 1.001)