Approximate derivatives in julia

Single-variable calculus has two main concepts: the derivative and the integral, both defined in terms of a third important concept: the limit. The derivative is what we discuss here. The derivative allows us to talk about a tangent line to a curve. This is often called a line that touches the curve at just one point, but that isn't quite correct, though often the case locally. A better idea would be to say it is the line that best approximates the curve at that point. That is, if we were to zoom in to the graph it might appear to look straight, and that straightness would have a slope that would match the tangent line. This idea leads to a primary use of the derivative -- to approximate functions with simpler lines. Of course, this intuition is informal, a definition is in terms of the slope of approximating secant lines.

Historically, Fermat, in a treatise on finding maxima and minima, approached the problem of finding the tangent line by comparing the value of \(f(x)\) to a nearby value \(f(x+h)\). Working with polynomials, meant that subtracting \(f(x+h) - f(x)\) led to a polynomial in \(h\). Dividing by \(h\) and then setting \(h=0\) yields an answer for the slope.

The more modern approach (well Cauchy in the 1820s) defines the derivative of a function, \(f\), at a point \(c\) as the slope of the tangent line, defined in terms of the limit of the slopes of approximating secant lines. The notation is:

\[ f'( c ) = \lim_{h \rightarrow 0} \frac{f(c + h) - f( c)}{h}.\quad\text{Derivative at a point} \]

As mentioned, intuitively, the tangent line is the best straight-line approximation to a function near the point \((c, f( c))\) and would have slope given by the derivative.

This graph shows \(f(x) = 2 - x^2\) at various secant lines at \(c=-0.75\):

f(x) = 2 - x^2; c = -0.75
sec_line(h) = x -> f(c) + (f(c + h) - f(c))/h * (x - c)
plot([f, sec_line(1), sec_line(.75), sec_line(.5), sec_line(.25)], -1, 1)

As the value of \(h\) goes towards 0 along the path \(1\), \(0.75\), \(0.5\), \(0.25\), ... the slope of the secant line heads towards the slope of the tangent line, in this case \(2\cdot 0.75 = 1.5\).

Using the idea of a derivative at a point, one defines the derivative of the function \(f\) to be the function which for each \(x\) returns the derivative of \(f\) at the point \(x\). Notationally, this just replaces \(c\) above with \(x\), but conceptually there is a bit more to it.

Forward differences

The rules of derivatives allow for a mechanical approach to taking derivatives with the power rule, chain rule and some special cases. In this project we look at approaching the derivative numerically. We start by investigating approximate derivatives computed with finite differences.

The most naive approximation is simply to assume \(h\) is some small number and use that to approximate the limit above:

f(x) = x^2 - 2x; fp(x) = 2x - 2
h = .001
## 0.001
c = 3
## 3
( f(c + h) - f(c) ) / h
## 4.000999999999699

This is known as the forward difference approximation to the derivative. For this example, the difference between the approximation and the actual slope is:

( f(c + h) - f(c) ) / h - fp(c)
## 0.0009999999996992415

That is not too close, just to 3 digits of accuracy.

Notationally, we may write this as:

\[ f'( c) \approx \frac{f(c + h) - f( c) }{h} \]

For some small value of \(h\). The equals sign is replaced by an approximation symbol, as there is no limit expression written.

Example:

Derivative at a point

Let's try the derivative of some well known function. We know the derivative of \(f(x) = \sin(x)\) is \(f'(x) = \cos(x)\). Let's see if the above works well:

f(x) = sin(x); fp(x) = cos(x)
c = pi/4; h = 0.0001
( f(c + h) - f(c) )/h - fp(c)
## -3.535651724428934e-5

Not as good as we can get -- the error is like 1e-5, but not too bad already.

Example:

Finding profit

Suppose the tractor company John Deere models its profit per units sold through the function \(P(x) = 50 e^{x/250}/(1 - e^{x/250})\). Find the marginal profit for \(x=200\).

The marginal profit is the change in profit per unit sold -- or the derivative -- at \(x=200\). We can find this quite simply by either differentiating directly, or as we do here, approximating the answer with julia. We first note that the function above, is a composition, so it is written using two functions:

f1(x) = exp(x)/(1 - exp(x))
f(x) = 50 * f1(x/250)
c = 200; h = 0.0001;
res = (f(c+h) - f(c))/h
## 0.29635326299626286

If \(50\) is the maximum profit, this is a percentage increase of:

(res/50) * 100
## 0.5927065259925257

or a bit more than half a percent.

Example:

Finding the tangent line at a point

Let \(f(x) = x^x\). Find the tangent line at \(c=2\). Compare the difference between the value of the tangent line and the function at \(x=2.1\).

The tangent line is most easily expressed in terms of the point-slope formula for a line where the point is \((c, f( c))\) and the slope is \(f'( c)\). This gives:

\[ y = f( c) + f'( c)\cdot(x - c) \]

In the following, the slope of the tangent line will be approximated using a numeric derivative. We use \(h=0.0001\):

f(x) = x^x
c = 2; h = 0.0001
m = ( f(c + h) - f(c) ) / h  
## 6.7732621193528075
tangent_line(x) = f(c) + m * (x - c)

To compare the difference, we have:

f(2.1) - tangent_line(2.1)
## 0.07231187980696063

A graph shows this difference:

plot([f, tangent_line], 1.85, 2.15)

Practice

Question

repeat at 2.05
find tangent line at ...
what is intercept ( x = 0)

Derivative of a function

We might be interested not in the derivative at a point, but in the derivative as a function. This is how we think of the derivative when we say if \(f(x) = \sin(x)\), the \(f'(x) = \cos(x)\). Mathematically, from this viewpoint, the derivative is referred to as an operator, as it takes a function and returns a function.

We can reproduce this behavior easily enough with julia, as functions are first class objects: they can be passed as arguments and returned as values.

First, let's define a function to find the derivative at a point using the "forward difference":

forward_difference(f, x0, h) = (f(x0 + h) - f(x0))/h

We need three arguments of course as we have three ingredients to worry about: the function, the point and the size of \(h\).

To make an operator that takes \(f\) and returns a function (an anonymous function in this case) that computes our approximation to \(f'\) we can do the following:

Df(f; h=1e-8) = x -> forward_difference(f, x, h)

We specified a default value of \(h\) for convenience, but allow it to be varied if desired.

To see this work, we can find and plot the derivative of \(\sin(x)\):

f(x) = sin(x)
fp(x) = Df(f)(x)
plot([f, fp], 0, 2pi)

Well, we already knew that should just be \(\cos(x)\). The point is we can easily make the an approximate derivative function from f with the definition fp(x) = Df(f)(x). The key here is first Df(f) is computed, returning a function which is then evaluated a specified x.

Defining fp or fp(x)

We could also have just plotted [f, Df(f)] without naming fp. As well, could have written fp = Df(f), but that creates fp as an anonymous function and we then couldn't redefine it through fp(x) = ..., which would attempt to make it a generic function.

Let's look at a different function, where we don't know in our heads the answer.

f(x) = exp(x)/(1 + exp(x))
plot([f, Df(f)], 0, 5)

If we look, we can see from the graph that f is increasing and Df(f) is positive -- this is no coincidence, of course.

Example:

Critical points

A function's critical points are where its derivative is \(0\) or undefined. We can graphically find these by graphing the function's derivative. Let's do so for the polynomial \(f(x) = x^3 - 5x + 4\):

f(x) = x^3 -5x + 4
fp(x) = Df(f)(x)
plot(fp, -5, 5)

You can check the zeroes graphically, say by zooming in a bit and adding the line \(y=0\):

plot([fp, x->0], -2, 2)

If you want, you can take the derivative by hand \(f'(x) = 3x^2 - 5\) and use the roots function to compare.

using Polynomial
p = [3, 0, -5]
## [3,0,-5]
roots(Poly(p))
## [-1.2909944487358056,1.2909944487358058]

polydir

We note briefly, that the Polynomial package has the polydir function to take derivatives:

p = Poly([1, 0, -5, 4]);
pprime = polydir(p);
roots(pprime)
## [-1.2909944487358056,1.2909944487358058]

Example:

When is a function increasing?

Let \(f(x) = \cos(x)\) over the interval \([0, 2\pi]\). When is the function increasing?

From the relationship of the derivative and the function, we know the answer is when \(f'(x) > 0\). For this example, we can solve this directly as \(f'(x) = -\sin(x)\) and we know when that is positive (well you are supposed to anyways: when the angle is in the third and fourth quadrants). Let's see how we would do this task with julia and compare.

First, we only need the derivative, so we just ask:

f(x) = cos(x)
fp(x) = Df(f)(x) ## use default h

Then we wish to indicate on the graph where fp(x) > 0. We can do this by defining a function that is \(0\) when that is the case and NaN otherwise (so that those points are not plotted). We do so below with an anonymous function:

plot([f, x -> fp(x) > 0 ? 0 : NaN], 0, 2pi)

The graph of the second function is only drawn when the first function is increasing. It is from \(\pi\) to \(2\pi\), or the third and fourth quadrant, as anticipated.

Practice

question

Evaluate at ...

question

Evaluate at ...

question

Make a graph of fp, what is ...

question

find the critical points of ...

question

XXX when is function increasing

Question

Filter used to return just those values, map

Improvements to the basic forward difference equation

The error in the approximation of the derivative depends on the size of \(h\). Mathematically, as \(h\) gets smaller we get better approximations, though with the computer other complications arise. To see mathematically that this is the case, let's look at the difference between the approximate numeric derivative from the forward difference and a known derivative.

f(x) = sin(x)
fp(x) = cos(x)
[ Df(f,h=h)(.5) - fp(.5) for h=[.1, .01, .001, .0001, .00001] ]
## {-0.0254132139820491,-0.0024117340199198978,-0.00023985901305767499,-2.3972739644051444e-5,-2.3971472203898614e-6}

It gets better as \(h\) gets smaller. Let's look a little deeper though. Rather than type in the values of \(h\) as above, let's use an expression to compute them. Here we find the powers \(10^{-1}, \dots, 10^{-16}\) at once and then compute the differences:

small_h = [(1/10)^i for i in 1:16];
out = [ Df(f,h=h)(.5) - fp(.5) for h in small_h ];
[small_h out]
## 16x2 Array{Any,2}:
##  0.1      -0.0254132  
##  0.01     -0.00241173 
##  0.001    -0.000239859
##  0.0001   -2.39727e-5 
##  1.0e-5   -2.39715e-6 
##  1.0e-6   -2.3969e-7  
##  1.0e-7   -2.4695e-8  
##  1.0e-8    2.85032e-10
##  1.0e-9    2.85032e-10
##  1.0e-10  -1.10737e-7 
##  1.0e-11  -1.22096e-6 
##  1.0e-12  -6.77208e-6 
##  1.0e-13   4.87391e-5 
##  1.0e-14  -0.000506372
##  1.0e-15   0.0105959  
##  1.0e-16   0.23264

When we look, we see that for awhile the approximation gets better (to about 1e-10), but then does a U-turn and starts getting worse. The mathematical approximation gets better and better, but what happens is the computational error gets worse. We'll see below how to pick the \(h\) that best balances these off, but first lets look at how using different approximations for the derivative can improve the "mathematical" error.

Central difference

It turns out that just by looking to the left and the right we can improve the mathematical error from getting better at a rate of \(h\) to a rate of \(h^2\), provided our function is smooth enough and we don't have issues on the boundaries. The formula, called the central difference approximation to the derivative is:

\[ f'(x) \approx \frac{ f(x + h) - f(x - h) }{2h} \]

For this the mathematical error is like \(h^2\), not \(h\).

Let's compare. To make our life easier we again create some functions, as we did with D above.

central_difference(f, x, h) = (f(x+h) - f(x-h))/(2h)
Dc(f; h=0.0001) = x -> central_difference(f, x, h)

Now to see whether a forward difference or central difference works better. We can do so with a table. Again with \(f(x) = \sin(x)\)

f(x) = sin(x)
fp(x) = cos(x)
using_D =  [ Df(f,h=h)(.5) - fp(.5) for h in small_h ];
using_Dc = [ Dc(f,h=h)(.5) - fp(.5) for h in small_h ]; 
[small_h using_D using_Dc]
## 16x3 Array{Any,2}:
##  0.1      -0.0254132    -0.00146191 
##  0.01     -0.00241173   -1.46263e-5 
##  0.001    -0.000239859  -1.46264e-7 
##  0.0001   -2.39727e-5   -1.46274e-9 
##  1.0e-5   -2.39715e-6   -1.75036e-11
##  1.0e-6   -2.3969e-7     7.47635e-12
##  1.0e-7   -2.4695e-8    -2.70079e-10
##  1.0e-8    2.85032e-10   2.85032e-10
##  1.0e-9    2.85032e-10   2.85032e-10
##  1.0e-10  -1.10737e-7   -1.10737e-7 
##  1.0e-11  -1.22096e-6   -1.22096e-6 
##  1.0e-12  -6.77208e-6   -6.77208e-6 
##  1.0e-13   4.87391e-5    4.87391e-5 
##  1.0e-14  -0.000506372  -0.000506372
##  1.0e-15   0.0105959     0.0105959  
##  1.0e-16   0.23264       0.23264

The errors for the central difference are either much smaller for the same size \(h\) or the same. We see that we can use a larger \(h\) to get the most accuracy in this example.

Example:

When does the tangent line hit 0?

Let \(f(x) = 10/(1+x^2) - 10\exp(-(1/2)x^2)\). The tangent line at \(x=c\) is given by

\[ y = f( c) - f'( c)(x - c) \]

and this intersects the \(x\) axis when \(y=0\). Solving this gives:

\[ c - f( c)/f'( c) \]

Our goal is to compute this value for any \(c > 0\).

Doing so is easy:

f(x) = 10/(1+x^2) - 10*exp(-(1/2)*x^2)
fp(x) = Dc(f)(x)
intersection_point(c) = c - f(c)/fp(c)

For example, when \(c=1\) we have:

c = 1;
intersection_point(c)
## 2.000000018978514

You can tell from the graph of \(f(x)\) that this value should be more than 1, as it is.

plot([f, x -> 0, x -> f(c) + fp(c)*(x-c)], .5, 2.1)

More generally ...

More generally, we could have defined intersection_point to accept a function with:

intersection_point(c, f) = c  - f(c)/Dc(f)(c)

They you could easily reuse it for other problems.

The central fourth difference

One can create successively better mathematical approximations if more steps are used. (E.g., see http://en.wikipedia.org/wiki/Finite_difference_coefficients .) This next approximiate derivative uses \(4\) nearby steps:

\[ f'(x) \approx \frac{-f(x + 2h) + 8f(x + h) - 8f(x-h) + f(x - 2h)}{12h} \] and has order \(h^4\).

central_4th_difference(f, x, h) = (-f(x + 2h) + 8f(x+h) - 8f(x-h) + f(x-2h))/(12h)
Dc4(f; h=0.001) = x-> central_4th_difference(f, x, h)

Since this is the best we are going to do, we define LaGrange's useful ' notation through the following trick:

Base.ctranspose(f::Function) = Dc4(f)

Then we have:

f(x) = x^2
f'(1) ## should be 2(1) = 2
## 1.9999999999997704

Practice

question

???

question

compare

question

Error analysis

The choice of \(h\) above was done somewhat arbitrarily, with only the intuition that smaller \(h\)'s should produce more accurate approximations (which of course may be way off, as we are subtracting like-sized values in the derivative formula). Here we look a bit more closely at how to choose \(h\).

As mentioned, with a bit of work from the second semester of calculus one can learn that the mathematical error in the forward difference is "order \(h\)" whereas the mathematical error in the central difference is "order \(h^2\). This means that as \(h\) gets small the approximation error in the first is a multple of \(h\), but for the second a multiple of \(h^2\) -- a much smaller error in general.

However there is also error due to floating point approximation. Such error is different in that it gets bigger as \(h\) gets smaller. So one error gets bigger, the other gets smaller. So clearly if \(h\) gets too small, the floating point error will dominate and the overall error will not get smaller, rather larger.

So, how big is the floating point error? For any given number, it can be roughly as big as the the machine precision amount, \(\epsilon\). So in the forward-difference approximation we could have errors in both terms \(f(x+h)\) and \(f(x)\) so the total error could be as big as \(2\epsilon\). But this is divided by \(h\) as we have:

\[ \frac{f(x+h) - f(x)}{h} = \frac{float(x +h) + \epsilon - float(x) + \epsilon}{h} = \frac{float(x+h)-float(x)}{h} + \frac{2\epsilon}{h}. \]

The errors may or may not cancel so the algebra with \(\epsilon\) is unusual to the untrained eye. It basically takes into account the worse case.

The key is the \(2\epsilon/h\) term -- it gets bigger as \(h\) gets smaller.

So if each floating point approximation is no more off than \(\epsilon\), we have this bound on the error:

\[ \text{error} \leq \frac{2\epsilon}{h} + (M/2)h \]

Where \(M\) is a constant that depends on the maximum absolute value of the second derivative and the \(1/2\) comes from second semester calculus. Choosing \(h\) to make this as small as possible is possible with some calculus involving the derivative, but for now we simply note the answer is \(h=((2\epsilon)/(M/2))^{1/2}\).

Taking \(\epsilon\) as machine tolerance and (conveniently) \(M=1\) we get

\[ h_{min} = \sqrt((2\epsilon)/(1/2)) \approx 10^{-8} \]

Or more precisely:

sqrt( (2*eps())/(1/2) )
## 2.9802322387695312e-8

We can check how this basically works for some test function:

f(x) = exp(x) - x^3
fp(x) = exp(x) - 3x^2
[small_h [Df(f,h=h)(.5) - fp(.5) for h in small_h] ]
## 16x2 Array{Any,2}:
##  0.1      -0.074746   
##  0.01     -0.00682885 
##  0.001    -0.000676365
##  0.0001   -6.75712e-5 
##  1.0e-5   -6.75649e-6 
##  1.0e-6   -6.75645e-7 
##  1.0e-7   -6.87972e-8 
##  1.0e-8    4.47752e-9 
##  1.0e-9   -1.73158e-7 
##  1.0e-10  -1.73158e-7 
##  1.0e-11   4.26773e-6 
##  1.0e-12  -0.000106755
##  1.0e-13  -0.00166107 
##  1.0e-14  -0.0105429  
##  1.0e-15  -0.0105429  
##  1.0e-16  -0.898721

We see the best case is around \(h=10^{-8}\), as expected by the theory.

Approximation errors for the central difference

For the central difference, the errors are different. The error in \(h\) becomes:

\[ \text{error} \leq (2\epsilon)/(2h) + (M/6) h^2 \]

This gives \((3\epsilon/M)^{1/3}\) as the choice for \(h\) that minimizes the right-hand expression.

Taking \(\epsilon\) as the machine tolerance and (conveniently) \(M=3\) we get

\[ h_{min} \approx 10^{-6} \]

We can again check how this works for our function and point:

[small_h [ Dc(f,h=h)(.5) - fp(.5) for h in small_h] ]
## 16x2 Array{Any,2}:
##  0.1      -0.00725076 
##  0.01     -7.25212e-5 
##  0.001    -7.25213e-7 
##  0.0001   -7.25199e-9 
##  1.0e-5   -8.54954e-11
##  1.0e-6    3.6629e-11 
##  1.0e-7   -1.07359e-9 
##  1.0e-8    4.47752e-9 
##  1.0e-9   -6.21359e-8 
##  1.0e-10  -1.73158e-7 
##  1.0e-11   4.26773e-6 
##  1.0e-12  -0.000106755
##  1.0e-13  -0.00166107 
##  1.0e-14  -0.0105429  
##  1.0e-15   0.100479   
##  1.0e-16   0.211502

Approximation errors for the central fourth difference

Under assumptions, the error with the central fourth difference is bounded above with:

\[ \text{error} \leq (18/12)\epsilon/h + (M/30) h^4 \]

This has minimum value

\[ h_{min} = 2^{-2/5} (30\cdot 18 \epsilon/(12M))^{1/5} \]

If we take \(M=1\) then this is approximately \(10^{-3}\).

Practice

question (Best depends on the function and value)

A subtle point above is that we are minimizing an upper bound in \(h\) with an assumption, not the actual error. The actual answer may differ as it depends on a function and the point of evaluation. Look at the function \(f(x) = x^2 - 2x\) and compare the values at \(0.5\) as above:

begin
  small_h = [(1/10)^i for i in 1:16]
  f(x) = x^2 - 2x
  fp(x) = 2x - 2
  [small_h [Dc(f,h=h)(.5) - fp(.5) for h in small_h] ]
end
## 16x2 Array{Any,2}:
##  0.1      -1.89872
##  0.01     -1.89872
##  0.001    -1.89872
##  0.0001   -1.89872
##  1.0e-5   -1.89872
##  1.0e-6   -1.89872
##  1.0e-7   -1.89872
##  1.0e-8   -1.89872
##  1.0e-9   -1.89872
##  1.0e-10  -1.89872
##  1.0e-11  -1.89872
##  1.0e-12  -1.8987 
##  1.0e-13  -1.89848
##  1.0e-14  -1.89792
##  1.0e-15  -1.89792
##  1.0e-16  -2.00894

What is the best value of \(h\)?

For which of these values of \(h\) is the error smallest?

question (Error analysis)

Consider these commands:

begin
  f(x) = sin(x)
  fp(x) = cos(x)
  [small_h [ Dc4(f,h=h)(.5) - fp(.5) for h in small_h] ]
end
## 16x2 Array{Any,2}:
##  0.1      1.87758
##  0.01     1.87758
##  0.001    1.87758
##  0.0001   1.87758
##  1.0e-5   1.87758
##  1.0e-6   1.87758
##  1.0e-7   1.87758
##  1.0e-8   1.87758
##  1.0e-9   1.87758
##  1.0e-10  1.87758
##  1.0e-11  1.87758
##  1.0e-12  1.87759
##  1.0e-13  1.87772
##  1.0e-14  1.87523
##  1.0e-15  1.88818
##  1.0e-16  2.29526

Do they show the smallest error around \(10^{-3}\) as expected?

Automatic differentiation

We discuss now Forward Mode Automatic Differentiation. Automatic differentiation avoids the numeric instability issues of finite differences by using a different approach. Whereas finite differences have a long history, automatic differentiation only dates back to the 60s.

Unlike, finite differences automatic differentiation does not have issues arising from the loss of precision encountered when subtracting two like-sized numbers. Unlike symbolic derivatives (such as those found by the Wolfram Alpha website used by your iPhone's Siri) they can be computed quickly, even if the computations defining the function involve many steps.

So what is automatic differentiation? The idea of the forward mode is quite intuitive. Automatic differentiation can be implemented by extending how we store a number. As with complex numbers, we use two components to keep track of a value. In this case, \((x, dx)\), where \(dx\) is the differential.

With this, the rules of differentials inform us how we combine these values. For example:

\((x, dx) + (y, dy) = (x + y, d(x+y)) = (x + y, 1dx + 1dy)\)
\((x, dx) \cdot (y, dy) = (xy, d(xy)) = (xy, y dx + x dy)\)
\((x, dx)^n = (x^n, d(x^n)) = (x^n, n x^{n-1} dx)\)

etc.

For functions we need to use the chain rule:

\(f((x, dx)) = (f(x), d(f(x))) = (f(x), f'(x) dx )\)

These formulas get abstracted by the dual numbers of the form \((a,b) = a + b \cdot j\) where \(j^2 = 0\). This is very similar to the complex numbers \(a + b \cdot i\) where we know \(i^2 = -1\). Let's see, for the dual numbers we would have \((a,b) + (c,d) = (a+b, c+d)\) which is exactly what we have for differentials. Looking at \((a,b) \cdot (c,d)\) this becomes \(ac + bc\cdot j + ad\cdot j + cb\cdot j^2) = (ac, bc + ad)\). Again, this is exactly what is the case for products with the differentials above.

The DualNumbers package provides an implementation of these dual numbers:

using DualNumbers

Now, how is this implemented? It takes two steps. First we need some means to make the new numbers. The dual function does this lifting up a value \(x\) into \((x,dx)\) and returning it as value of type Dual:

x = dual(5, 1)
## 5 + 1du
(real(x), imag(x))          # The two components
## (5,1)

Like the complex numbers, the real function gives the first part (the x), the imag function gives the second (the dx).

Changing our addition, multiplication, division, ... and functions requires us to overload these operations to do something different when they encounter objects of Dual type. This is implemented easily enough in julia, as functions can be restricted by the type of the argument. For example, we could extend the base \(\cos\) function to dual numbers with:

Base.cos(a::Dual) = dual(cos(real(x)), -sin(real(x)) * imag(x))

This does what any good cosine function should do for the first component and implements the chain rule for the second. Finally, it combines the value into an object with the proper type. We don't need to do this though, the DualNumbers package did it for us.

Now, let's see how it all fits together by computing \(\cos(x^2)\) when \(x=5\).

We would have \(x^2\) would be by above \((5^2, 2 \cdot 5^1 \cdot 1) = (25, 10)\). Which we can check:

x = dual(5, 1)
## 5 + 1du
x^2
## 25 + 10du

Finally, \(\cos(x^2)\) would be \((\cos(25), -\sin(25) \cdot 10)\) where \(\sin\) is known from the derivative of \(\cos\).

cos(x^2)
## 0.9912028118634736 + 1.3235175009777302du

The actual slope of the tangent line at \(5\) is the second number, we can get by projecting the second component:

cos(x^2) |> imag
## 1.3235175009777302

Finding the derivative at a point, in this case \(x=5\), is the basis for the derivative function, but it is almost always more convenient to work with the function. An operator D can be defined as above to do the three steps:

D(f::Function) = x -> dual(x, 1) |> f |> imag

That is a function which takes x, lifts it up to be a dual object, applies f, then extracts the derivative

The function D is an operator, in that it takes a function and returns a (new) function which we use as other julia functions:

f(x) = cos(x)
g(x) = D(f)(x)
g(pi)            ## compare to sin(pi)
## -1.2246467991473532e-16

Or with plotting:

f(x) = exp(-x)*sin(x)
plot([f, D(f)], 0, 2pi)

The function D we just defined works with most functions -- but not all -- and in particular not functions returned by D itself. This makes it difficult to do second derivatives in this manner.

There is code in the Roots package that fleshes out this idea

using Roots
f(x) = sin(x)
fp(x) = D(f)(x)

This redefines the D operator, allowing one to pass the order as a second argument to D, as in D(f, 2) to find the second automatic derivative. This task is common enough, that there is the shortcut D2. Be warned, higher order derivatives, say anything above 10, get quite slow given the simple implementation employed.

f(x) = x^2 - 2x
plot([f, D(f), D2(f)], -2, 2)

Practice

question

Which definition would be right for \((x,dx) - (y, dy)\)?

question

Which would be a proper definition for \(\exp\)?

question (Find tangent line intersection)

Redefine intersection_point as:

intersection_point(c, f) = c - f(c)/D(f)(c)

Now for \(f(x) = 10/(1+x^2) - 10 \cdot \exp(-(1/2) \cdot x^2)\) and \(c=1\) find the intersection point.

question (L'Hopital's rule)

Later, you will learn L'Hopital's rule which says that if \(\lim_{x\rightarrow 0} f(x)/g(x)\) has indeterminate from \(0/0\), then if the limit of \(f'(x)/g'(x)\) exists, this will be the desired limit. This can make something easy to do in the head:

\[ \lim_{x \rightarrow 0} \frac{\sin(x)}{x} = \lim_{x \rightarrow 0} \frac{\cos(x)}{1} = 1 \]

The latter just by plugging in. Not only does it work to make mental calculations easier, it can be used through automatic differentiation to avoid the subtraction by similar values issues we saw in looking at the limit at 0 of \((1-\cos(x))/x^2\).

Let,

f(x) = 1 - cos(x); g(x) = x^2

Look at the expressions

vals = [(1/10)^i for i in 1:16];
ratio = [f(x)/g(x) for  x in vals];
dratio = [D(f)(x)/D(g)(x) for  x in vals];
[ratio dratio]
## 16x2 Array{Any,2}:
##  0.499583  0.499167
##  0.499996  0.499992
##  0.5       0.5     
##  0.5       0.5     
##  0.5       0.5     
##  0.500044  0.5     
##  0.4996    0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5     
##  0.0       0.5

Do you see the trouble with numeric roundoff in the latter expression you see in the former?

question (L'Hopital's rule, in action)

Use L'Hopital's to find the limit as \(x\) goes to \(0\) of:

\[ \frac{\sqrt{1 - kx} - \sqrt{1 + kx}}{x}, \]

where \(k=5\).

question (What is the size of the error?)

We can compare the error at many different points with the following construct

f(x) = exp(-x)*sin(x)
fp(x) = exp(-x)*(cos(x) - sin(x))
[abs(D(f)(x) - fp(x) ) for x in linspace(0, pi, 25)]

What is the order (exponent) of the largest difference above?