Solving for zeros with julia

Solving for zero is a mathematical skill taught early on. In some cases, such as with linear equations, solving for zeros can be done directly using algebra. Similarly, in the case of factorable polynomials, we are taught to factor and then set each term to 0 to find the possible solutions.

However, in general, the problem of finding one (or all) solutions to the equation

\[ f(x) = 0. \]

for an arbitrary \(f\) has no well-defined process.

A related problem is to find one (or all) solutions to an equation of this type:

\[ f(x) = g(x) \]

Conceptually this is identical to the above, as we just set \(h(x) = f(x) - g(x)\) and solve for when \(h(x)\) is \(0\).

Here we discuss a few different elementary means to do find zeros with julia, leaving some others for a later time.

Zeros of a polynomial

Polynomials are special functions, in that their relatively simple form allows for many explicit things to be known. E.g., for quadratic polynomials, one of these facts is the quadratic equation which explicitly gives a formula for the roots of a quadratic polynomial. (A "root" of a polynomial is just a polynomial-specific name for a zero.)

For example, if we have the quadratic polynomial \(2x^2 + 3x - 2\) we can solve for the roots with:

a = 2; b = 3; c = -2
discr = b^2 - 4*a*c
## 25
( (-b + sqrt(discr))/(2a), (-b - sqrt(discr))/(2a) )
## (0.5,-2.0)

If you wanted to write a function to do this, it would be some straightforward, save the detail of needing to make a negative number complex in order to take its square root:

## find roots of ax^2 + bx + c
function quadratic(a, b, c)
  discr = b^2 - 4*a*c
  sq = (discr > 0) ? sqrt(discr) : sqrt(discr + 0im)

  [(-b - sq)/(2a), (-b + sq)/(2a)]
end

And to find the roots of \(x^2 + x - 1\) we have:

quadratic(1, 1, -1)
## [-1.618033988749895,0.6180339887498949]

There are further such formula for third and fourth degree polynomials. However, Galois at the tender age of 20 demonstrated that in general, there can be no formula for the roots of a fifth or higher degree polynomial. There are still facts known about such polynomials. For example, the Fundamental theorem of algebra states that every polynomial of degree \(n\) will have \(n\) roots, where we count complex roots and multiplicities.

How to find them is left to numeric methods though. In julia there is an add-on package that will do this work for us. To use add on code in julia is relatively painless. In this case, the code is in the Polynomial package. Once installed (Pkg.add("Polynomial")), we load that package with:

using Polynomial        ## may need Pkg.add("Polynomial")

The command to find the roots of the polynomial \(f(x)\) is simply roots. That's easy. However, specifying the polynomial is done differently than just defining \(f(x)\) as a function.

A general polynomial is defined through its coefficients:

\[ a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0. \]

Just like with our quadratic function, to represent a polynomial most efficiently, only the coefficients are needed. We place them into a vector with the leading coefficient first:

[an, a_n1, a_n2, ..., a1, a0]

Or to be specific, if \(f(x) = -16x^2 + 32 x\) then this is

\[ -16 x^2 + 32 x + 0 \]

So \(a_2 = -16\), \(a_1 = 32\) and importantly \(a_0 = 0\), so we would use [-16, 32, 0].

The key point is that we must put \(0\) for any missing terms and a \(1\) when the term is a simple monomial like \(x^2\).

To create a polynomial object from this specification is done with the Poly constructor:

p = Poly([-16, 32, 0])
## Poly(-16x^2 + 32x^1)

And finding the roots is done with the roots function:

roots(p)
## [2.0,0.0]

That's it.

Poly objects have a few other functions defined for them including the standard mathematical operations.

To illustrate let's build up the polynomial \(p(x) = (x-1)*(x-2)^2*(x-3)^2\)

p1, p2, p3 = Poly([1,-1]), Poly([1, -2]), Poly([1, -3])
## (Poly(1x^1 + -1),Poly(1x^1 + -2),Poly(1x^1 + -3))
p = p1 * p2^2 * p3^2
## Poly(1x^5 + -11x^4 + 47x^3 + -97x^2 + 96x^1 + -36)

To see that we have the three distinct roots we started with from the factored form:

roots(p)
## [0.9999999999999978,1.9999998107957662,2.0000001892043513,3.000000300887259,2.999999699112625]

The polyval function can be used to evaluate the polynomial. For example, here is how we create a function from a polynomial object:

f(x) = polyval(p, x)

The polyval function does the hard work. It multiplies the coefficients of p against the proper powers of x and adds the results.

For example, we can graph f with

plot(f, -1, 3)

What isn't easy to do without more work is to start with a function object, such as f(x) = x^2 + 3x and produce a Poly object without just typing it in.

Practice

Question

Use the roots command to find the largest real root of the polynomial \(x^2 + x - 1\)

Question

Use the roots command to find the largest real root of the polynomial \(x^3 - x + \)13

Question

For the polynomial defined by

p = Poly([1, 2, 3, 4, 5]);

find the value of this polynomial when \(x=2\).

Graphical solutions

More generally, the equation \(f(x) = 0\) may not have any special form leading to a known solution. In this case, there are various techniques to find zeros. Here we mention graphing, such as is done with a graphing calculator. In the next section, we discuss the bisection algorithm for root finding.

Example:

Finding a zero

The flight of an arrow can be modeled using various functions, depending on assumptions. Suppose an arrow is launched in the air from a height of 0 feet above the ground at an angle of \(\theta = \pi/4\). With a suitable choice for the initial velocity, a model without wind resistance for the height of the arrow at a distance \(x\) units away may be:

\[ j(x) = \tan(\theta) x- 16(\frac{x}{200 \cos\theta})^2. \]

With a wind resistance given by \(k\), again with some units, a similar equation can be constructed. It takes a different form:

\[ g(x) = \tan(\theta) x + \frac{32/k}{200 \cos\theta} x - \frac{32}{k} \log\left(\frac{200 \cos\theta}{200\cos\theta - x}\right). \]

For each model, we wish to find the value of \(x\) after launching where the height is modeled to be 0. That is how far will the arrow travel before touching the ground?

For the model without wind resistance, we can graph the function easily enough. Let's guess the distance is no more than 500 feet:

begin
  theta = pi/4
  j(x) = tan(theta) * x - 16(x/(200cos(theta)))^2
  plot(j, 0, 500)
end

Well, we haven't even seen the peak yet. Better to do a little spade work first. This is a quadratic equation, so we can use our function quadratic. We just need to identify a, b and c.

The value of a comes by factoring out the \(x^2\):

a = - 16 * (1/(200cos(theta)))^2
## -0.0008

The values for b and c should be clear, so we get

begin
  a = - 16 * (1/(200cos(theta)))^2
  b = tan(theta)
  c = 0
  roots(Poly([a,b,c]))  ## or  quadratic(a, b, c)
end
## [1249.9999999999998,0.0]

We see that 1250 is the value. So we plot over this domain to visualize the flight:

plot(j, 0, 1250)

As for the model with g, we first define the function. This same function appeared in a previous example. We copy that work here:

function g(x; theta = pi/4, k = 1/2)
     a = 200*cos(theta)
     b = 32/k
     tan(theta)*x + (b/a)*x - b*log(a/(a-x))
end

A quick plot over the same interval, \([0, 1250]\) yields:

plot(g, 0, 1250)
## "DomainError()"

Oh, "Domain Error." Of course, when \(x \geq a\), the evaluation of \(\log(a/(a-x))\) will fail. We try on the reduced interval avoiding the obvious asymptote at a by subtracting \(1\):

a = 200*cos(theta)
plot(g, 0, a - 1) 

Now we can see the zero is between 130 and 140. We re-plot:

plot(g, 130, 140)

Finally, we plot between 134 and 135 and draw the line for \(y=0\) to see the zero clearly:

plot([g, x -> 0], 134, 135)

The answer is approximately \(134.8\)

Finally, we plot both graphs at once to see that it was a very windy day indeed.

a = 134.8
plot([j, u -> u < a ? g(u) : NaN] , 0, 1250)

Practice

Question

Repeat the above with a value of \(k=\)0.5. How far does the arrow travel to one decimal point?

Question

In an analysis of rainbows, Airy developed a special function implemented as airy in julia. The zeros are all negative. The first one is between \(-3\) and \(-1\). Find it graphically.

Question

The polynomial \(f(x) = x^5 -6x^3 -6x^2 -7x - 6\) has three real roots. Which of the following values is one of them? Try to solve this by graphically.

Bisection algorithm

The last example had us graphically "zoom" in on a zero, and led us to an estimate to \(1\) or \(2\) decimal points. Trying to get more than that accuracy graphically is at best tedious. Here we discuss a method to get as much accuracy as is numerically possible.

The bisection algorithm is a simple, iterative procedure for finding a numeric, approximate root of a continuous function over the closed interval \([a,b]\) provided \(f(a)\) and \(f(b)\) are not \(0\) and do not share the same sign. We say \(a\) and \(b\) bracket a root. With these assumptions, the fact there is at least one root (possibly more) is due to the intermediate value theorem.

The basic idea is the midpoint of \([a,b]\), \(M = (a + b)/2\), is tested for its function value. If \(f(M) = 0\), great, we are done. If it has opposite sign of \(f(a)\), then a root must be in the interval \([a,M]\), so the problem is reduced a smaller interval. Otherwise, it has opposite sign of \(f(b)\) and the problem is reduced to \([M,b]\). At this point, the algorithm is repeated. As each step halves the interval length, it must eventually converge to an answer within some specified tolerance. An alternative name for this algorithm is the divide and conqueror algorithm.

To code this requires a few tricks.

First, we need some means to test if a function value is \(0\), or really close enough to \(0\). As values are floating point, they may not be exactly \(0\). As such, we use a tolerance that checks if a number is approximately \(0\). Here is a function to do so with a default tolerance (1e-8 is basically sqrt(eps()), the square root of machine tolerance):

close_enough(x, y; tol= 1e-8) = abs(x - y) < tol

Let's check to see it working:

close_enough(1.1,  1, tol=0.01) ## difference is 0.1 > 0.01
## false
close_enough(1.01, 1, tol=0.1) ## difference is 0.01 < 0.1
## true

With this, we can begin to sketch out our flow with a simple problem: find a zero of \(f(x) = x^2 - 2\) between \(0\) and \(2\). Clearly, the answer is \(\sqrt{2}\).

a = 0; b=2
f(x) = x^2 - 2
mid = (a + b)/2
## 1.0
close_enough(f(mid), 0.01)
## false

Well are we "close enough?". Clearly not (\(f(1) = -1\) and is nowhere near \(0\)). So we close in. We move a or b depending on the sign. If f(mid) is of different sign than f(a) then we set b = mid, otherwise we set a = mid. This is done quickly as follows using the ternary operator:

f(a) * f(mid) < 0 ? (b = mid) : (a = mid)
## 1.0

The parentheses are needed around the assignments to b and a as otherwise the ternary operator does not work as desired.


Now we update the midpoint and check again

mid = (a + b)/2
## 1.5
close_enough(f(mid), 0.01)
## false

Still, not good enough, so we repeat the three key steps:

f(a) * f(mid) < 0 ? (b = mid) : (a = mid)
## 1.5
mid = (a + b)/2
## 1.25
close_enough(f(mid), 0.01)
## false

Again, still not good enough. We find, that we will need to repeat the above three lines several times to get to an answer at the very undemanding tolerance of \(0.01\).

The fact that the method is slow to converge would be a problem for us, were we not able to let the computer repeat the steps until it is done. As such, we need some way to repeat a process an arbitrary number of times. A for loop repeats a specific number of times. Not quite right. For this task, we want a while loop which will continue looping until some condition is met (our condition of close_enough).

Our code is as follows. The basic algorithm is fairly straightforward to implement. Try and read over the code to see what is being done.

function bisection_method (f::Function, bracket=Vector)
    a,b = bracket[1:2]; 

    ## redefine to be self-contained
    close_enough(x) = abs(x) < 1e-8

    ## check endpoints
    if f(a) * f(b) >= 0
      error("Function values at endpoints must have different signs, and not be zeros")
    end       

    update_mid(a, b) = (a + b)/2

    ## begin
    mid = update_mid(a, b)
    ctr = 1

    while ( !close_enough(f(mid)) )
        ## update a or b and then mid
        f(a) * f(mid) < 0 ? (b = mid) : (a = mid)        
        mid = update_mid(a, b)
        ctr = ctr + 1
    end

    println("Converged to $mid in $ctr steps")

    return(mid)
end

To use this, copy and paste the above function definition into a julia session.

Okay, let's look at the function \(f(x) = -16x^2 + 32x\). We know that 0 and \(2\) are roots. Let's see if our algorithm finds them:

f(x) = -16x^2 + 32x
bisection_method(f, [-1, 1]) ## should find 0
## 0.0
bisection_method(f, [1, 3])  ## should find 2
## 2.0

Okay, it seems to work. Lets try it on a less trivial problem. We know \(\sin(x)\) and \(\cos(x)\) cross in the interval \([0, \pi/2]\). If we are too tired to remember where, we can simply ask:

f(x) = cos(x) - sin(x)
out = bisection_method(f, [0, pi/2])
## 0.7853981633974483

This, of course, looks better as \(\pi/4\):

out - pi/4
## 0.0
Example:

Graphical and numerical answers

One needs to know where to look in order to use the bisection method. The basic one-two punch is

  • graph the function to identify quickly values \([a,b]\) which bound a zero, then

  • use the bisection method to find the zero to many decimal points.

Here we illustrate with the problem of finding all intersection points of \(e^x = x^4\) over the interval \([0,10]\). That is, for the function \(f(x) = e^x - x^4\) find the zeros.

A quick plot shows that the function has such a wide range that looking over the entire domain at once will be problematic:

f(x) = exp(x) - x^4
plot(f, 0, 10)

Instead, we look between \([0,3]\) and \([8,9]\). A quick confirmation shows these are good choices to use. For example, between \(8\) and \(9\) we have:

plot([f, x->0], 8, 9)      ## plot f and the line y=0

So we find the values of the zero in the bracketed region \([8,9]\):

bisection_method(f, [8, 9])
## 8.61316945643921

The root within \([0, 3]\) is found with:

bisection_method(f, [0, 3])
## 1.4296118253841996

The Roots package and fzero

The bisection method, while easy to describe and understand, is not the most efficient. In the Roots package the fzero function is provided that is used as bisection_method is. This function employs a more modern algorithm, with better convergence properties.

For example, to find a root of \(f(x) = 2x \cdot \exp(-20) - 2 \cdot \exp(-20x) + 1\) in the interval \([0,1]\) we have:

using Roots
f(x) = 2x*exp(-20) - 2*exp(-20x) + 1
fzero(f, [0, 1])
## 0.03465735902085386

The fzero function is actually an interface to different root-finding algorithms. When called as above (the second argument being a vector), it uses a bracketing approach. When p is a polynomial object, the call fzero(p) will basically use the roots command, though the actual algorithm does a better job when the zeros have multiplicities, e.g. if \(p(x) = (x-1)^2 \cdot (x-2)^3\).

Problems

Question

In the bisection method algorithm we checked that the value of \(f\) at \(a\) and \(b\) had opposite signs by looking at \(f(a)\cdot f(b)\). Why did this work?

Question Are there other roots in [-10, 0]?

There is another root in the interval \([-10, 0]\) for the function \(f(x) = e^x - x^4\). Find its value numerically:

Question Relation between \(x^2\) and \(x \log(x)\)

Let b= 26 and \(f(x) = x^2 - b \cdot x \cdot \log(x)\). This function has two zeros on the positive \(x\) axis. You are asked to find the largest (graph and bracket...):

Question

The airy function has infinitely many negative roots, as the function oscillates when \(x < 0\). In a previous problem we graphically found the largest root. Now find the second largest root using the graph to bracket the answer, and then solving.

Question Solve arrow problem with \(a/2\) and \(a-1\)

Use the bisection method to find the root of

\[ g(x) = \tan(\theta) x + \frac{32/k}{200 \cos\theta} x - \frac{32}{k} \log\left(\frac{200 \cos\theta}{200\cos\theta - x}\right). \]

for \(\theta = \pi/4\) and \(k=1/2\). If \(a = 200 \cos\theta\) use the values \(a/2\) and \(a-1\) to bracket the root.