Solving for zero is a mathematical skill taught early on. In some cases, such as with linear equations, solving for zeros can be done directly using algebra. Similarly, in the case of factorable polynomials, we are taught to factor and then set each term to 0 to find the possible solutions.
However, in general, the problem of finding one (or all) solutions to the equation
\[ f(x) = 0. \]
for an arbitrary \(f\) has no well-defined process.
A related problem is to find one (or all) solutions to an equation of this type:
\[ f(x) = g(x) \]
Conceptually this is identical to the above, as we just set \(h(x) = f(x) - g(x)\) and solve for when \(h(x)\) is \(0\).
Here we discuss a few different elementary means to do find zeros with julia
, leaving some others for a later time.
Polynomials are special functions, in that their relatively simple form allows for many explicit things to be known. E.g., for quadratic polynomials, one of these facts is the quadratic equation which explicitly gives a formula for the roots of a quadratic polynomial. (A "root" of a polynomial is just a polynomial-specific name for a zero.)
For example, if we have the quadratic polynomial \(2x^2 + 3x - 2\) we can solve for the roots with:
a = 2; b = 3; c = -2
discr = b^2 - 4*a*c
## 25
( (-b + sqrt(discr))/(2a), (-b - sqrt(discr))/(2a) )
## (0.5,-2.0)
If you wanted to write a function to do this, it would be some straightforward, save the detail of needing to make a negative number complex in order to take its square root:
## find roots of ax^2 + bx + c
function quadratic(a, b, c)
discr = b^2 - 4*a*c
sq = (discr > 0) ? sqrt(discr) : sqrt(discr + 0im)
[(-b - sq)/(2a), (-b + sq)/(2a)]
end
And to find the roots of \(x^2 + x - 1\) we have:
quadratic(1, 1, -1)
## [-1.618033988749895,0.6180339887498949]
There are further such formula for third and fourth degree polynomials. However, Galois at the tender age of 20 demonstrated that in general, there can be no formula for the roots of a fifth or higher degree polynomial. There are still facts known about such polynomials. For example, the Fundamental theorem of algebra states that every polynomial of degree \(n\) will have \(n\) roots, where we count complex roots and multiplicities.
How to find them is left to numeric methods though. In julia
there is an add-on package that will do this work for us. To use add on code in julia
is relatively painless. In this case, the code is in the Polynomial
package. Once installed (Pkg.add("Polynomial")
), we load that package with:
using Polynomial ## may need Pkg.add("Polynomial")
The command to find the roots of the polynomial \(f(x)\) is simply roots
. That's easy. However, specifying the polynomial is done differently than just defining \(f(x)\) as a function.
A general polynomial is defined through its coefficients:
\[ a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0. \]
Just like with our quadratic
function, to represent a polynomial most efficiently, only the coefficients are needed. We place them into a vector with the leading coefficient first:
[an, a_n1, a_n2, ..., a1, a0]
Or to be specific, if \(f(x) = -16x^2 + 32 x\) then this is
\[ -16 x^2 + 32 x + 0 \]
So \(a_2 = -16\), \(a_1 = 32\) and importantly \(a_0 = 0\), so we would use [-16, 32, 0]
.
The key point is that we must put \(0\) for any missing terms and a \(1\) when the term is a simple monomial like \(x^2\).
To create a polynomial object from this specification is done with the Poly
constructor:
p = Poly([-16, 32, 0])
## Poly(-16x^2 + 32x^1)
And finding the roots is done with the roots function:
roots(p)
## [2.0,0.0]
That's it.
Poly
objects have a few other functions defined for them including the standard mathematical operations.
To illustrate let's build up the polynomial \(p(x) = (x-1)*(x-2)^2*(x-3)^2\)
p1, p2, p3 = Poly([1,-1]), Poly([1, -2]), Poly([1, -3])
## (Poly(1x^1 + -1),Poly(1x^1 + -2),Poly(1x^1 + -3))
p = p1 * p2^2 * p3^2
## Poly(1x^5 + -11x^4 + 47x^3 + -97x^2 + 96x^1 + -36)
To see that we have the three distinct roots we started with from the factored form:
roots(p)
## [0.9999999999999978,1.9999998107957662,2.0000001892043513,3.000000300887259,2.999999699112625]
The polyval
function can be used to evaluate the polynomial. For example, here is how we create a function from a polynomial object:
f(x) = polyval(p, x)
The polyval
function does the hard work. It multiplies the coefficients of p
against the proper powers of x
and adds the results.
For example, we can graph f
with
plot(f, -1, 3)
What isn't easy to do without more work is to start with a function object, such as f(x) = x^2 + 3x
and produce a Poly
object without just typing it in.
Use the roots
command to find the largest real root of the polynomial \(x^2 + x - 1\)
Use the roots
command to find the largest real root of the polynomial \(x^3 - x + \)13
For the polynomial defined by
p = Poly([1, 2, 3, 4, 5]);
find the value of this polynomial when \(x=2\).
More generally, the equation \(f(x) = 0\) may not have any special form leading to a known solution. In this case, there are various techniques to find zeros. Here we mention graphing, such as is done with a graphing calculator. In the next section, we discuss the bisection algorithm for root finding.
Example:Finding a zero
The flight of an arrow can be modeled using various functions, depending on assumptions. Suppose an arrow is launched in the air from a height of 0 feet above the ground at an angle of \(\theta = \pi/4\). With a suitable choice for the initial velocity, a model without wind resistance for the height of the arrow at a distance \(x\) units away may be:
\[ j(x) = \tan(\theta) x- 16(\frac{x}{200 \cos\theta})^2. \]
With a wind resistance given by \(k\), again with some units, a similar equation can be constructed. It takes a different form:
\[ g(x) = \tan(\theta) x + \frac{32/k}{200 \cos\theta} x - \frac{32}{k} \log\left(\frac{200 \cos\theta}{200\cos\theta - x}\right). \]
For each model, we wish to find the value of \(x\) after launching where the height is modeled to be 0. That is how far will the arrow travel before touching the ground?
For the model without wind resistance, we can graph the function easily enough. Let's guess the distance is no more than 500 feet:
begin
theta = pi/4
j(x) = tan(theta) * x - 16(x/(200cos(theta)))^2
plot(j, 0, 500)
end
Well, we haven't even seen the peak yet. Better to do a little spade work first. This is a quadratic equation, so we can use our function quadratic
. We just need to identify a
, b
and c
.
The value of a
comes by factoring out the \(x^2\):
a = - 16 * (1/(200cos(theta)))^2
## -0.0008
The values for b
and c
should be clear, so we get
begin
a = - 16 * (1/(200cos(theta)))^2
b = tan(theta)
c = 0
roots(Poly([a,b,c])) ## or quadratic(a, b, c)
end
## [1249.9999999999998,0.0]
We see that 1250 is the value. So we plot over this domain to visualize the flight:
plot(j, 0, 1250)
As for the model with g
, we first define the function. This same function appeared in a previous example. We copy that work here:
function g(x; theta = pi/4, k = 1/2)
a = 200*cos(theta)
b = 32/k
tan(theta)*x + (b/a)*x - b*log(a/(a-x))
end
A quick plot over the same interval, \([0, 1250]\) yields:
plot(g, 0, 1250)
## "DomainError()"
Oh, "Domain Error." Of course, when \(x \geq a\), the evaluation of \(\log(a/(a-x))\) will fail. We try on the reduced interval avoiding the obvious asymptote at a
by subtracting \(1\):
a = 200*cos(theta)
plot(g, 0, a - 1)
Now we can see the zero is between 130 and 140. We re-plot:
plot(g, 130, 140)
Finally, we plot between 134 and 135 and draw the line for \(y=0\) to see the zero clearly:
plot([g, x -> 0], 134, 135)
The answer is approximately \(134.8\)
Finally, we plot both graphs at once to see that it was a very windy day indeed.
a = 134.8
plot([j, u -> u < a ? g(u) : NaN] , 0, 1250)
Repeat the above with a value of \(k=\)0.5
. How far does the arrow travel to one decimal point?
In an analysis of rainbows, Airy developed a special function implemented as airy
in julia
. The zeros are all negative. The first one is between \(-3\) and \(-1\). Find it graphically.
The polynomial \(f(x) = x^5 -6x^3 -6x^2 -7x - 6\) has three real roots. Which of the following values is one of them? Try to solve this by graphically.
The last example had us graphically "zoom" in on a zero, and led us to an estimate to \(1\) or \(2\) decimal points. Trying to get more than that accuracy graphically is at best tedious. Here we discuss a method to get as much accuracy as is numerically possible.
The bisection algorithm is a simple, iterative procedure for finding a numeric, approximate root of a continuous function over the closed interval \([a,b]\) provided \(f(a)\) and \(f(b)\) are not \(0\) and do not share the same sign. We say \(a\) and \(b\) bracket a root. With these assumptions, the fact there is at least one root (possibly more) is due to the intermediate value theorem.
The basic idea is the midpoint of \([a,b]\), \(M = (a + b)/2\), is tested for its function value. If \(f(M) = 0\), great, we are done. If it has opposite sign of \(f(a)\), then a root must be in the interval \([a,M]\), so the problem is reduced a smaller interval. Otherwise, it has opposite sign of \(f(b)\) and the problem is reduced to \([M,b]\). At this point, the algorithm is repeated. As each step halves the interval length, it must eventually converge to an answer within some specified tolerance. An alternative name for this algorithm is the divide and conqueror algorithm.
To code this requires a few tricks.
First, we need some means to test if a function value is \(0\), or really close enough to \(0\). As values are floating point, they may not be exactly \(0\). As such, we use a tolerance that checks if a number is approximately \(0\). Here is a function to do so with a default tolerance (1e-8
is basically sqrt(eps())
, the square root of machine tolerance):
close_enough(x, y; tol= 1e-8) = abs(x - y) < tol
Let's check to see it working:
close_enough(1.1, 1, tol=0.01) ## difference is 0.1 > 0.01
## false
close_enough(1.01, 1, tol=0.1) ## difference is 0.01 < 0.1
## true
With this, we can begin to sketch out our flow with a simple problem: find a zero of \(f(x) = x^2 - 2\) between \(0\) and \(2\). Clearly, the answer is \(\sqrt{2}\).
a = 0; b=2
f(x) = x^2 - 2
mid = (a + b)/2
## 1.0
close_enough(f(mid), 0.01)
## false
Well are we "close enough?". Clearly not (\(f(1) = -1\) and is nowhere near \(0\)). So we close in. We move a
or b
depending on the sign. If f(mid)
is of different sign than f(a)
then we set b = mid
, otherwise we set a = mid
. This is done quickly as follows using the ternary operator:
f(a) * f(mid) < 0 ? (b = mid) : (a = mid)
## 1.0
The parentheses are needed around the assignments to b
and a
as otherwise the ternary operator does not work as desired.
Now we update the midpoint and check again
mid = (a + b)/2
## 1.5
close_enough(f(mid), 0.01)
## false
Still, not good enough, so we repeat the three key steps:
f(a) * f(mid) < 0 ? (b = mid) : (a = mid)
## 1.5
mid = (a + b)/2
## 1.25
close_enough(f(mid), 0.01)
## false
Again, still not good enough. We find, that we will need to repeat the above three lines several times to get to an answer at the very undemanding tolerance of \(0.01\).
The fact that the method is slow to converge would be a problem for us, were we not able to let the computer repeat the steps until it is done. As such, we need some way to repeat a process an arbitrary number of times. A for
loop repeats a specific number of times. Not quite right. For this task, we want a while
loop which will continue looping until some condition is met (our condition of close_enough
).
Our code is as follows. The basic algorithm is fairly straightforward to implement. Try and read over the code to see what is being done.
function bisection_method (f::Function, bracket=Vector)
a,b = bracket[1:2];
## redefine to be self-contained
close_enough(x) = abs(x) < 1e-8
## check endpoints
if f(a) * f(b) >= 0
error("Function values at endpoints must have different signs, and not be zeros")
end
update_mid(a, b) = (a + b)/2
## begin
mid = update_mid(a, b)
ctr = 1
while ( !close_enough(f(mid)) )
## update a or b and then mid
f(a) * f(mid) < 0 ? (b = mid) : (a = mid)
mid = update_mid(a, b)
ctr = ctr + 1
end
println("Converged to $mid in $ctr steps")
return(mid)
end
To use this, copy and paste the above function definition into a julia
session.
Okay, let's look at the function \(f(x) = -16x^2 + 32x\). We know that 0 and \(2\) are roots. Let's see if our algorithm finds them:
f(x) = -16x^2 + 32x
bisection_method(f, [-1, 1]) ## should find 0
## 0.0
bisection_method(f, [1, 3]) ## should find 2
## 2.0
Okay, it seems to work. Lets try it on a less trivial problem. We know \(\sin(x)\) and \(\cos(x)\) cross in the interval \([0, \pi/2]\). If we are too tired to remember where, we can simply ask:
f(x) = cos(x) - sin(x)
out = bisection_method(f, [0, pi/2])
## 0.7853981633974483
This, of course, looks better as \(\pi/4\):
out - pi/4
## 0.0
Example: Graphical and numerical answers
One needs to know where to look in order to use the bisection method. The basic one-two punch is
graph the function to identify quickly values \([a,b]\) which bound a zero, then
use the bisection method to find the zero to many decimal points.
Here we illustrate with the problem of finding all intersection points of \(e^x = x^4\) over the interval \([0,10]\). That is, for the function \(f(x) = e^x - x^4\) find the zeros.
A quick plot shows that the function has such a wide range that looking over the entire domain at once will be problematic:
f(x) = exp(x) - x^4
plot(f, 0, 10)
Instead, we look between \([0,3]\) and \([8,9]\). A quick confirmation shows these are good choices to use. For example, between \(8\) and \(9\) we have:
plot([f, x->0], 8, 9) ## plot f and the line y=0
So we find the values of the zero in the bracketed region \([8,9]\):
bisection_method(f, [8, 9])
## 8.61316945643921
The root within \([0, 3]\) is found with:
bisection_method(f, [0, 3])
## 1.4296118253841996
The bisection method, while easy to describe and understand, is not the most efficient. In the Roots
package the fzero
function is provided that is used as bisection_method
is. This function employs a more modern algorithm, with better convergence properties.
For example, to find a root of \(f(x) = 2x \cdot \exp(-20) - 2 \cdot \exp(-20x) + 1\) in the interval \([0,1]\) we have:
using Roots
f(x) = 2x*exp(-20) - 2*exp(-20x) + 1
fzero(f, [0, 1])
## 0.03465735902085386
The fzero
function is actually an interface to different root-finding algorithms. When called as above (the second argument being a vector), it uses a bracketing approach. When p
is a polynomial object, the call fzero(p)
will basically use the roots
command, though the actual algorithm does a better job when the zeros have multiplicities, e.g. if \(p(x) = (x-1)^2 \cdot (x-2)^3\).
In the bisection method algorithm we checked that the value of \(f\) at \(a\) and \(b\) had opposite signs by looking at \(f(a)\cdot f(b)\). Why did this work?
There is another root in the interval \([-10, 0]\) for the function \(f(x) = e^x - x^4\). Find its value numerically:
Let b=
26
and \(f(x) = x^2 - b \cdot x \cdot \log(x)\). This function has two zeros on the positive \(x\) axis. You are asked to find the largest (graph and bracket...):
The airy
function has infinitely many negative roots, as the function oscillates when \(x < 0\). In a previous problem we graphically found the largest root. Now find the second largest root using the graph to bracket the answer, and then solving.
Use the bisection method to find the root of
\[ g(x) = \tan(\theta) x + \frac{32/k}{200 \cos\theta} x - \frac{32}{k} \log\left(\frac{200 \cos\theta}{200\cos\theta - x}\right). \]
for \(\theta = \pi/4\) and \(k=1/2\). If \(a = 200 \cos\theta\) use the values \(a/2\) and \(a-1\) to bracket the root.