The process of integration has an easy answer: $\int_a^b f(x) dx = F(b) - F(a)$ when you can find an antiderivative $F$. But that isn't always the case.
The Risch algorithm characterizes when an elementary function has an antiderivative of a specific form. The original problem is posed by Liouville. Not all elementary functions (functions obtained by composing exponentials, logarithms, radicals, trigonometric functions, and the four arithmetic operations (+ − × ÷)) are integrable. For example, $f(x) = \exp(x^2)$.
We can approximate an integral a few ways. First, if we approximate $f$ by $g$, then $\int_a^b f(x) dx \approx \int_a^b g(x) dx$.
For example, if we approximate $\sin(x)$ by its Taylor polynomial of degree $5$ we get an approximation.
using Roots using Polynomials f(x) = sin(x) x = poly([0.0]) g = sum([ D(f,k)(0)*x^k / factorial(k) for k in 0:5]) [g, polyint(g)]
2-element Array{Polynomials.Poly{Float64},1}: Poly(x - 0.16666666666666666x^3 + 0.008333333333333333x^5) Poly(0.5x^2 - 0.041666666666666664x^4 + 0.001388888888888889x^6)
Integrating over $[0,\pi]$ gives an exact answer of 2. This one approximates via:
diff(map(x->polyval(polyint(g), x), [0, pi]))
1-element Array{Float64,1}: 2.21135
Not so close!
The Taylor series isn't the best approximation over an interval, rather it is good near a point, in this case $c=0$.
using Plots plot([f, x -> polyval(g, x)], 0, pi)
We discussed that there exists a unique polynomial of degree $n$ or less that goes through the points $(x_i, f(x_i))$ for $i=0, 1, \dots, n$. We gave a formula in terms of divided differences. This is due to Newton. Lagrange gave an alternate form. Let
$$~ l_i(x) = \prod_{j=0, j\neq i}^n \frac{x - x_j}{x_i-x_j}. ~$$Then, $p(x) = \sum_0^n f(x_i)l_i(x)$ so for $n+1$ points chosen in $[a,b]$ we have
$$~ \int_a^b f(x) dx \approx \int_a^b p(x) dx = \int_a^b \sum f(x_i) l_i(x) dx = \sum f(x_i) \int_a^b l_i(x) dx. ~$$If $A_i = \int_a^b l_i(x) dx$, then we have the integral is approximately $\sum A_i f(x_i)$. (The $A_i$ do not depend on $f$, so could in theory be pre-computed.)
A Newton-Cotes formula is one where the $x_i$ are equally spaced over $[a,b]$.
Issues here are related to the approximating polynomials getting squirrelly out near the ends of the interval (Runge phenomenon).
Consider the easiest case $x_0$ some point in $[a,b]$. Then we have the approximate integral: $f(x_0)(b-a)$.
What is the error? We know that the error using $n+1$ points is $f(x) - p(x) = f^{(n+1)}(\xi_x)/(n+1)! (x-x_0)\cdots(x-x_n)$, so in this case we have:
$$~ \begin{align} \int_a^b f(x) dx - \int_a^b p(x) &= \int_a^b (f(x) - p(x)) dx\\ &= \int_a^b f'(\xi_x)(x-x_0) dx\\ &\approx f'(\xi) \frac{(x-x_0)^2}{2} \mid_a^b \end{align} ~$$If we were to take $x_0 = a$, then this becomes $f'(\xi)(b-a)^2/2$. So it depends on the length of the interval and the derivative.
Now suppose, $x_0=a, x_1=b$. So $n=1$. Then we have the trapezoid rule. We have then
$$~ l_0(x) = \frac{b-x}{b-a}, \quad \text{ and } l_1(x) = \frac{x-a}{b-a} ~$$And $A_0 = A_1 = (b-a)/2$.
So, $\int_a^b f(x) dx \approx f(x_0)A_0 + f(x_1) A_1 = (b-a)/2 \cdot (f(a) + f(b))$. The formula for the trapezoid.
To compute the error, we have $f(x) - p(x) = f''(\xi_x)/2 (x-a)(x-b)$. Is we write:
$$~ \int_a^b (f(x) - p(x))dx = -\int_a^b f''(\xi_x)/2 (x-a)(x-b) dx = -f''(\xi) \int_a^b (x-a)(x-b)/2 dx = -f''(\xi) (b-a)^3/12. ~$$Again, the error depends on the length of the integral cubed.
Take the partition $a=x_0 < x_1 < \cdots < x_{n-1} < x_n = b$ and apply the trapezoid rule to each pair. This will give an approximation formula. Suppose $x_i = a + (b-a)/n \cdot i$. Then the formulas become:
$$~ \int_a^b f(x) dx \approx \frac{h}{2}(f(a) + 2 \sum_{i=1}^{n-1} f(a + ih) + f(b)), \quad h=(b-a)/2. ~$$The errors add to give:
$$~ \text{error } = -\sum_i^n f''(\xi_i) h^3/12 \approx -nf''(\xi) h^3/12 =- f''(\xi) \frac{(b-a)^3}{12n^2}. ~$$Using the points $a = x_0 < x_1 < x_2=b$ with $x_1$ the midpoint, yields Simpson's rule.
The approximation becomes:
$$~ \int_a^b f(x) dx = \frac{b-a}{6}(f(a) + 4f(\frac{a+b}{2}) + f(b)) ~$$With error, $-1/90 \cdot ((b-a)/2)^5 f^{(4)}(\xi)$.
If we apply over an equally spaced partition, the errors accumulate to yield
$$~ \text{Simpson's error} = -\frac{1}{180} \frac{(b-a)^5}{n^4} f^{(4)}(\xi) ~$$We can numerically compare these errors.
riemann(f, a,b) = f(a)*(b-a) trapezoid(f,a,b) = 1/2 * (b-a)*(f(b) + f(a)) simpson(f, a, b) = (b-a)/6*(f(a) + 4f((a+b)/2) + f(b))
simpson (generic function with 1 method)
Let's look at $f(x) = e^x$ between $0$ and $\log(2)$ with an answer of $e^{log(2)} - e^0 = 2 - 1 = 1$.
n = 10 a,b = 0, log(2) xs = linspace(a, b, n+1) xis = zip(xs[1:end-1], xs[2:end]) f(x) = exp(x) r = sum([riemann(f, xi, xi1) for (xi,xi1) in xis]) t = sum([trapezoid(f, xi, xi1) for (xi,xi1) in xis]) s = sum([simpson(f, xi, xi1) for (xi,xi1) in xis]) [r-1 t-1 s-1]
1x3 Array{Float64,2}: -0.034257 0.000400345 8.01396e-9
If we have a formula to approximate $\int_c^d f(x) dx$ then we can do a linear $u$-substitution to integrate over $[a,b]$. The transform is
$$~ u(t) = \frac{b-a}{d-c}t + \frac{ad - bc}{d-c} ~$$And then
$$~ \int_a^b f(x) dx = \frac{b-a}{d-c}\int_c^d f(u(t)) dt. ~$$It is conventional to use $a=-1, b=1$.
The Newton-Cotes formulas can be generalized to handle integrals with weight functions:
$$~ \int_a^b f(x) w(x) dx \approx \int_a^b \sum f(x_i) l_i(x) w(x) dx = \sum f(x_i) \int_a^b l_i(x) w(x) = \sum f(x_i) A_i. ~$$As before, only with the weight function defining $A$. Again, the $A_i$ do not depend on $x$. If we have $a,b=-1,1$, then they only depend on how the $x_i$ are chosen.
Integrating the general error term and assuming $|f^{(n+1)}(\xi) | \leq M$ we have
$$~ | \int_a^b f(x) dx - \sum A_i f(x_i)| \leq \frac{M}{(n+1)!} \int_a^b \prod|x - x_i| dx. ~$$Some choice of $x_i$ will minimize this product.
Claim: For $a=-1, b=1$, the following will minimize:
$$~ x_i = \cos(\frac{(i+1)\pi}{n+2}), \quad 0 \leq i \leq n. ~$$Let's investigate:
using Interact, Gadfly xs = linspace(-1,1, 251) f(xs) = x -> prod([abs(x-xi) for xi in xs]) @manipulate for x0=xs, x1=xs, x2=xs, x3=xs vs = [x0,x1,x2,x3] v = quadgk(f(vs), -1, 1)[1] plot(f(vs), -1, 1) title!(string(v)) end
How to compare:
n = 3 i = 0:n cos((i+1)*pi/(n+2))
4-element Array{Float64,1}: 0.809017 0.309017 -0.309017 -0.809017
These values are the zeros of the function
$$~ U_{n+1}(x) = \frac{\sin((n+2)\theta)}{\sin(\theta)}, \quad x = \cos(\theta). ~$$These functions are polynomials in $x$! Called the Chebyshev polynomials of the second kind.
They have leading coefficient $2^{n+1}$, as
$$~ \int_{-1}^1 |\prod (x-x_i)| dx = 2^{-n} ~$$With these nodes, one can minimize the error bound to get:
$$~ |\text{error}| \leq \frac{M}{(n+1)! 2^n}. ~$$Thm (p487): For monic polynomials $p$ of degree $n$, the Chebyshev polynomials minimize $\int_{-1}^1 |p(x)| dx$.