Newton's method using julia

Newton's method is an old method for approximating a zero of a function, \(f(x)\):

\[ f(x) = 0 \]

Previously we discussed the bisection method which applied for some continuous function \(f(x)\) which changed signs between \(a\) and \(b\), points which bracket a zero. Not only did we need to find these bracketing points -- which wasn't hard from a graph, more importantly the actual algorithm is pretty slow.

If the function \(f\) is sufficiently differentiable, then Newton's method may work to find a zero. Unlike the bisection method which is slow but guaranteed to find a root by the intermediate value theorem, Newton's method is fast (once it is close) but has no such guarantee of converging. In this project, we'll see how to implement the algorithm, try some examples, and then look at what can go wrong.

Basic idea

The idea behind Newton's method is simple -- linear approximation. That is, most functions at any given point are well approximated by the tangent line at that point. If we have a good guess \(x_n\), then we can improve this guess iteratively by replacing it with the zero, \(x_{n+1}\), of the tangent line at \((x_n, f(x_n))\).

Embedded Image

A simple picture shows that we have a triangle with base \(x_{n} - x_{n+1}\), rise \(f(x_n) - 0\) and slope \(f'(x_n)\), using the "rise over run" formula:

\[ f'(x_n) = \frac{f(x_n)}{x_{n} - x_{n+1}}. \]

The basic algorithm of Newton's methods is then:

\[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. \]

Some books write the right-hand side as \(x_n - f'(x_n)^{-1} f(x_n)\), a form that generalizes to different settings.

Like the bisection method, Newton's method is an iterative method. One begins with a (suitable) guess \(x_0\). From that the algorithm produces \(x_1\) which is used to produce \(x_2\), etc. The idea is that one eventually will settle on a value \(x_n\) sufficiently close to the desired root.

Mathematically, the indices indicate that the right hand side is computed and assigned to the left hand side. This is exactly what is done in assignment within julia, so the above simply becomes:

x = x - f(x)/fp(x)

Where f(x) is the function and fp(x) its derivative. This line starts with a previously defined value of x and updates it accordingly.

The updating is continued -- by executing the exact same command -- until either the algorithm has gotten close enough to an answer (i.e., it has converged) or we have given up on it converging.

We can determine two ways that the number is close enough to the answer:

  • The sequence of x's stop changing by much
  • the values f(x) get close enough to zero.

Both concepts require a tolerance. For the first case, it is easiest to stop when the printed value of x stops changing. This is 16 digits so may be problematic with round off errors involved (e.g., the first 15 digits can stay fixed, while the last oscillates), though easy to use at the command line.

For the second we will use a standard tolerance, the square root of the floating point representation limit at \(1\) given conveniently with:

tol = sqrt(eps())
## 1.4901161193847656e-8

Approximate answers only:

It is important to realize that the actual answer is not likely to be the value computed by Newton's method, which we call xstar at times. In most cases, the true answer will be irrational and xstar a floating point number, which ultimately can never be better than an approximation to an irrational number. This is why we need to be clear by what we mean by "close enough."

Here is an example to find a zero of the function: \(f(x) = x^3 - 2x - 5\).

f(x) = x^3 - 2x - 5
fp(x) = 3x^2 - 2
x = 2
## 2
x = x - f(x)/fp(x)
## 2.1
x = x - f(x)/fp(x)
## 2.094568121104185
x = x - f(x)/fp(x)
## 2.094551481698199
x = x - f(x)/fp(x)
## 2.0945514815423265
x = x - f(x)/fp(x)      # x stopped changing
## 2.0945514815423265

The resulting value of x we can see when evaluated is less the tolerance in absolute value:

f(x)
## -8.881784197001252e-16
abs(f(x)) < tol
## true

You can see in this case that the convergence happens quickly as soon as the algorithm gets close.

Some history

Newton looked at this same example in 1699 (B.T. Polyak, Newton's method and its use in optimization, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, per se, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of Newton-Raphson method.


Practice

Question

Using Newton's method to find \(\sqrt{k}\) (by solving for roots of \(f(x) = x^2 - k\)) is also referred to as the Babylonian method, due to its origins. The resulting method

\[ x_{n+1} = \frac{1}{2}(x_n + \frac{k}{x_n}) \]

is described by the first-century Greek mathematician Hero of Alexandria.

Let \(k=15\) and \(x_0\) be 3.24. What is the value of \(x_3\)?

Question

The function \(f(x) = \sin(x)\) has derivative \(f'(x) = \cos(x)\). Use Newton's method to solve \(f(x) = 0\) starting at \(3\). Repeat until the digits stop changing. What value do you get?

(This can be used to compute \(\pi\) numerically, as the convergence is very fast. Here it takes 4 steps to verify.)

Question

Let \(f(x) = x^2 - 3^x\). This has derivative \(2x - 3^x \cdot \log(3)\). Starting with \(x_0=0\), what does Newton's method converge on?

Using the function value to decide

Here we use the other concept of close to check. That is, we repeat the expression until we find the answer to abs(f(x)) < tol to be true. This idea generalizes to higher dimensions and is more in line with the task at hand.

As with the bisection method, we define a function to do these comparisons:

close_enough(x; tol = sqrt(eps())) = abs(x) < tol

We revisit a problem from a previous project, finding zeroes of the function \(f(x) = \exp(x) - x^4\). We know from previous work that there are three of them. Let's find one of them using a somewhat large tolerance:

f(x) = exp(x) - x^4
fp(x) = exp(x) - 4x^3
x = 2
## 2
x = x - f(x)/fp(x);
(x, close_enough(f(x), tol=0.0001)) ## print out current x and close_enough
## (1.650117283770851,false)
x = x - f(x)/fp(x);
(x, close_enough(f(x), tol=0.0001)) 
## (1.4772564896555644,false)
x = x - f(x)/fp(x);
(x, close_enough(f(x), tol=0.0001)) 
## (1.4324534830046274,false)
x = x - f(x)/fp(x);
(x, close_enough(f(x), tol=0.0001))
## (1.4296227107803885,true)

Now that we are "close enough", let's see how close:

f(x)
## -8.175795725229307e-5

This last example suggests an algorithm for Newton's method: while the value is not within tolerance repeat the algorithm unless it has gone on too long to have any hope of success.

This we can implement in julia, as it is similar to what we did for the bisection method. The only new wrinkle is we keep track of a counter to see if we have tried too many times, as defined below by max_steps.

function nm(f, fp, x; tol=sqrt(eps()))
   
    ctr, max_steps = 0, 100
     
    while (abs(f(x)) > tol) && ctr < max_steps
        x = x - f(x) / fp(x)
        ctr = ctr + 1
    end

    ctr >= max_steps ? error("Method did not converge") : return (x, ctr)
    
end

Here we try it:

f(x) = x^3 - 5x + 1
fp(x) = 3x^2 - 5
xstar, ctr = nm(f, fp, 0)
## (0.20163967572339103,3)

You can try it too. Copy and paste the above in. (Alternatively you can use the newton function of the Roots package which also uses a tolerance to check if \(x_{n+1}\) and \(x_n\) get within a tolerance.)

Practice

Question

Repeat the problem of finding a root of \(f(x) = \exp(x) - x^4\) starting at \(x=2\). How many steps does it take with the default tolerance used in nm?

Question

If we repeat with \(f(x) = \exp(x) - x^4\) but start now at \(x=8\) where does the algorithm converge?

Question

Let \(f(x) = \sin(x) - \cos(k\cdot x)\) for \(k=\) 9

Starting at \(\pi/(2k)\), solve for the root returned by Newton's method

Numeric derivatives

In order to use Newton's method we need to evaluate by hand \(f'(x)\). Though not so hard with the examples we have used so far, it may prove to be tedious, or even hard to compute the derivative. (In higher order-generalizations, it can be difficult.)

However, Newton's method is actually fairly robust to using other related values to the derivative.

The secant method

The secant method is perhaps the oldest numerical linear algebra tool dating back over 3000 years well before Newton's method. Rather than use the derivative at \(x_i\) to compute \(x_{i+1}\), the secant line is used between \(x_{i-1}\) and \(x_i\). This method will also converge to a zero with a good starting point, though not nearly as quickly as Newton's method.

You can check -- if you want -- by repeating the last command until the change in x2 is within your tolerance:

x2, x1 = 1, 2           # initial guess of 2
## (1,2)
f(x) = x^2 - 2          # some function
fp(x1,x2) = (f(x1) - f(x2))/(x1 - x2)
x2, x1 = x2 - f(x2)/fp(x1, x2), x2 # update step
## (1.3333333333333333,1)

(How many update steps does it take for x1 and x2 to become the same value (after which, the above method will fail)?)

One can also use approximate derivatives based on forward differences in place of \(f'(x)\) in the formula. Again, this won't be as fast.

In a previous project we discussed automatic differentiation. As these values are basically exact, the automatic derivative can replace \(f'(x)\) and the method will be just as quick to converge.

To see how this is done, we load in the automatic derivative operator D from the Roots package

using Roots

The Newton's method can be used quite simply. For example, let \(f(x) = \exp(x) - x^4\) and set \(x_0 = -1\). We can find a root with the following command:

f(x) = exp(x) - x^4
xstar, ctr = nm(f, D(f), -1)
## (-0.8155534188761755,4)

(In fact, using newton from Roots allows you to simply calling newton(f, -1) will use the derivative computed using D.)

There are also very fast algorithms which do not require a derivative. The fzero function from Roots uses a recent method due to Thukral that is a fast, iterative, root-finding algorithm requiring no derivative. You can try it with the previous f through:

xstar2 = fzero(f, -1)
## -0.8155534188089607

The values are different, as they are using different default tolerances. We can see that in this case fzero is "closer":

(f(xstar2), f(xstar))
## (-1.1102230246251565e-16,-1.7557744147467247e-10)

Practice

Question

Let

\[ f(x) = 4x^4 -5x^3 + 4x^2 -20x - 6 \]

Apply Newton's method with \(x_0=0\) using an automatic derivative. What value does it converge to?

Question

Let's try with a function where the derivative is not known easily. If we set

\[ f(x) = x^x - 2 \]

Can we find a root using Newton's method, where \(x > 0\)?

We graph the function to see, using a smallish interval at first:

f(x) = x^x - 2
plot([f, x -> 0], 0, 2)

Eyeing this, we pick an initial point, \(1\), for Newton's method to the right of the minimum, which appears to be around \(x=0.35\).

xstar, ctr = nm(f, D(f), 1);

What is the value of xstar?

Question Finding the average of some numbers the hard way

Let's do a round about way of finding the average of 5 numbers. We define our 5 values and a function which computes the squared difference from a given value of these numbers.

vals = [1, 4, 5, 6, 8]
## [1,4,5,6,8]
f(x) = sum([(x - i)^2 for i in vals])

(This can also be expressed more compactly with the "dot" notation f(x) = sum((x - vals).^2) which may be preferable to some.)

It is known that the minimum value of f(x) will occur at the mean of the 5 numbers.

When trying to find the minimum of a function, we know that a consideration of the critical points of \(f\) are in order. Critical points satisfy \(f'(x) = 0\) (Fermat's condition). As such, we apply Newton's method to \(f'\) -- not \(f\). This requires us to pass the first and second derivatives of \(f\) to our nm function:

(x, ctr) = nm(D(f), D2(f), 4);

What value does this return (compare with mean(vals))?

Various issues

As great as Newton's method is, it won't always work for various reasons, some of which are described in the following. Here is what you need to keep in mind. Newton's method works well if

  • The zero is a simple root -- that is of multiplicity 1

  • \(|f'(x)|\) is not too small (If the tangent line is nearly flat, the next guess is far from the previous)

  • \(|f''(x)|\) is not too big (function doesn't have so much curve that the tangent line is a poor approximation)

  • The initial guess is not to far from a zero

The above points come from the following formula which you can find in many texts.

\[ \Delta x_{i+1} = \frac{f' '(\alpha)}{f'(\alpha)}(\Delta x_i)^2 + \text{error} \]

which is valid when \(f(x)\) satisfies \(f(\alpha) = 0\), the third derivative exists near \(\alpha\), and \(\Delta x_i = x_i - \alpha\) is the error between the zero and the \(i\) th estimate. The error is basically a constant times \(\Delta x_i^2\). This is interpreted as saying there is quadratic convergence in the error, as the next one is related to the previous one squared.

Now we look at some cases where the above three points do not hold.

The initial guess is no where near the end results

Let \(f(x) = \sin(x) - x/4\) and \(x_0 = 2\pi\). This value is deliberately a poor choice:

f(x) = sin(x) - x/4
fp(x) = cos(x) - 1/4
nm(f, fp, 2pi)
## (-2.474576797589688,21)

Though julia makes this happen fast, note that the counter got to 20 before converging and the answer is no where near the guess. This trace might show why

Embedded Image

Question

When \(|f'(x)|\) is too close to \(0\), the path can jump alot. In the figure, what was the longest jump?

ErrorException("choices not defined")

Question

The method did find a zero, but the initial guess was nowhere near the final zero. How close was the closest zero to the initial guess?

Function has a poor shape

Let \(f(x) = x^{1/3}\). We know the root is 0. Let's see what happens if we use Newton's method. We have to be careful though as julia thinks that cube roots of negative numbers (via x^(1/3) are NaN, not a number. (You can avoid this, by making your number complex, e.g. x + 0*im, but then the real answer is not given as an answer. It is just one of three and not the chosen one.)

So we define our function using julia's cbrt function, which works as we desire for negative numbers, as follows:

f(x) = cbrt(x)
fp(x) = sign(x) * (1/3) * cbrt(x)/x
xstar, ctr = nm(f, fp, 2)
## "ErrorException(\"Method did not converge\")"

Question

Despite all our care with the derivative, the method did not converge in \(200\) steps. Can you see why from this trace?

Embedded Image

Questions Solve by hand

For \(f(x) = x^{1/3}\), simplify the expression by hand:

\[ x - f(x)/f'(x) \]

What do you get?

Question

Apply Newton's method to \(f(x) = (x-2)^7 \cdot (x-3) \cdot (x-4)\) starting at 1.9981. The algorithm does not converge to 2 -- an obvious root. From a plot of \(f(x)\) explain why not:

Cycles

Sometimes, the process can cycle even for reasonable functions.

Question

Let \(f(x) = x^3 - 5x\). Starting with \(x_0=1\), compute three steps of Newton's method. What are the terms in the series produced?

Question Always cycles...

Here is a pathological example where the value always cycles no matter where you start except 0.

Let \(f(x) = \sqrt{|x|}\). This is the one-sided square root function turned into an even function. We could also have defined it by:

f(x) = x >= 0 ? sqrt(x) : sqrt(-x)

where the ternary operator a ? b : c looks at a and if true will execute b otherwise c.

This makes it easier to write the derivative of the function in Julia:

fp(x) = x >=0 ? (1/2)*sqrt(x)/x : -(1/2)*sqrt(-x)/(-x)

To see what happens when using Newton's method, lets start at \(x=2\)

x = 2
## 2
x = x - f(x)/fp(x)
## -2.0
x = x - f(x)/fp(x)
## 2.0
x = x - f(x)/fp(x)
## -2.0

Try again with \(x=3.0\) What sequence do you get: