Newton's method is not the only algorithm of its kind for identifying zeros of a function. In this section we discuss some alternatives.
The function find_zero
from the Roots
packages provides several different algorithms for finding a zero of a function, including some a derivative-free algorithms for finding roots when started with an initial guess. The default method is similar to Newton's method in that only a good initial guess is needed. However, the algorithm, while slower in terms of function evaluations and steps, is engineered to be a bit more robust to the choice of initial estimate than Newton's method. (If it finds a bracket, it will use a bisection algorithm which is guaranteed to converge, but can be slower to do so.) Here we see how to call the function:
using Roots f(x) = cos(x) - x x0 = 1 find_zero(f, x0)
0.7390851332151607
Compare to this related call which uses the bisection method:
find_zero(f, (0, 1)) ## [0,1] must be a bracketing interval
0.7390851332151607
For this example both give the same answer, but the bisection method is a bit less convenient as a bracketing interval must be pre-specified.
The default find_zero
method above uses a secant-like method unless a bracketing method is found. Here we place the secant method into a more general framework.
One way to view Newton's method is through the inverse of $f$, as if $f(\alpha) = 0$ then – when $f^{-1}(x)$ exists – $\alpha = f^{-1}(0)$. If $f$ has a simple zero at $\alpha$ and is locally invertible (that is some $f^{-1}$ exists) then the update step for Newton's method can be identified with: fitting a polynomial to the local inverse function of $f$ going through through the point $(f(x_0),x_0)$, and matching the slope of f
at the same point.
That is, we can write $g(y) = h_0 + h_1 (y-f(x_0))$. Then $g(f(x0)) = x0 = h_0$, so $h_0 = x_0$, and from $g'(f(x_0)) = 1/f'(x_0)$, we get $h_1 = 1/f'(x_0)$, so $g(y) = x_0 + (y-f(x_0))/f'(x_0)$. At $y=0$, we get the update step $x_1 = g(0) = x_0 - f(x_0)/f'(x_0)$.
The same viewpoint can be used to create derivative-free methods.
For example, the secant method can be seen as the result of fitting a polynomial approximation for $f^{-1}$ through two points $(f(x_0),x_0)$ and $(f(x_1), x_1)$.
Again, expressing this as $g(y) = h_0 + h_1(y-f(x_1))$ leads to $g(f(x_1)) = x_1 = h_0$. Substituting $f(x_0)$ gives $g(f(x_0)) = x_0 = x_1 + h_1(f(x_0)-f(x_1))$. Solving for $h_1$ leads to $h_1=(x_1-x_0)/(f(x_1)-f(x_0))$. Then $x_2 = g(0) = x_1 + (x_1-x_0)/(f(x_1)-f(x_0)) \cdot f(x_1)$. This is the first step of the secant method:
\[ x_{n+1} = x_n - f(x_n) \frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}. \]
The secant method simply replaces $f'(x_n)$ with the slope of the secant line between $x_n$ and $x_{n-1}$. The method is historic, dating back over $3000$ years.
We code the update step as λ2
:
λ2(f0,f1,x0,x1) = x1 - f1 * (x1-x0)/(f1-f0)
λ2 (generic function with 1 method)
Then we can run a few steps to identify the zero of sine starting at $3$ and $4$
x0,x1 = 4,3 f0,f1 = sin.((x0,x1)) @show x1,f1 x0,x1 = x1, λ2(f0,f1,x0,x1) f0,f1 = f1, sin(x1) @show x1,f1 x0,x1 = x1, λ2(f0,f1,x0,x1) f0,f1 = f1, sin(x1) @show x1,f1 x0,x1 = x1, λ2(f0,f1,x0,x1) f0,f1 = f1, sin(x1) @show x1,f1 x0,x1 = x1, λ2(f0,f1,x0,x1) f0,f1 = f1, sin(x1) x1,f1
(x1, f1) = (3, 0.1411200080598672) (x1, f1) = (3.157162792479947, -0.015569509788328599) (x1, f1) = (3.14154625558915, 4.639800062679684e-5) (x1, f1) = (3.1415926554589646, -1.8691713617942337e-9) (3.141592653589793, 1.2246467991473532e-16)
Like Newton's method, the secant method coverges quickly for this problem (though its rate is less than the quadratic rate of Newton's method.)
This method is included in Roots
as Order1()
(or Roots.Secant()
):
find_zero(sin, (4,3), Order1(), verbose=true)
Results of univariate zero finding: * Converged to: 3.141592653589793 * Algorithm: Roots.Secant() * iterations: 4 * function evaluations: 6 * stopped as |f(x_n)| ≤ max(δ, max(1,|x|)⋅ϵ) using δ = atol, ϵ = rtol Trace: x_0 = 3.0000000000000000, fx_0 = 0.1411200080598672 x_1 = 3.1571627924799470, fx_1 = -0.0155695097883286 x_2 = 3.1415462555891498, fx_2 = 0.0000463980006268 x_3 = 3.1415926554589646, fx_3 = -0.0000000018691714 x_4 = 3.1415926535897931, fx_4 = 0.0000000000000001 3.141592653589793
Though the derivative is related to the slope of the secant line, that is in the limit. The convergence of the secant method is not as fast as Newton's method, though at each step of the secant method, only 1 new function evaluation is needed, so it can be more efficient for functions that are expensive to compute or differentiate.
Let $\epsilon_{n+1} = x_{n+1}-\alpha$, where $\alpha$ is assumed to be a simple zero of $f(x)$ that the secant method converges to. A calculation shows that
\[ \begin{align*} \epsilon_{n+1} &\approx \frac{x_n-x_{n-1}}{f(x_n)-f(x_{n-1})} \frac{(1/2)f''(\alpha)(e_n-e_{n-1})}{x_n-x_{n-1}} \epsilon_n \epsilon_{n-1}\\ & \approx \frac{f''(\alpha)}{2f'(\alpha)} \epsilon_n \epsilon_{n-1}\\ &= C \epsilon_n \epsilon_{n-1}. \end{align*} \]
The constant C
is similar to that for Newton's method, and reveals potential troubles for the secant method similar to those of Newton's method: a poor initial guess (the initial error is too big), the second derivative is too large, the first derivative too flat near the answer.
Assuming the error term has the form $\epsilon_{n+1} = A|\epsilon_n|^\phi$ and substituting into the above leads to the equation
\[ \frac{A^{1-1/\phi}}{C} = |\epsilon_n|^{1 - \phi +1/\phi}. \]
The left side being a constant suggests $\phi$ solves: $1 - \phi + 1/\phi = 0$ or $\phi^2 -\phi - 1 = 0$. The solution is the golden ratio, $(1 + \sqrt{5})/2 \approx 1.618\dots$.
Steffensen's method is a secant-like method that converges with $|\epsilon_{n+1}| \approx C |\epsilon_n|^2$. The secant is taken between the points $(x_n,f(x_n))$ and $(x_n + f(x_n), f(x_n + f(x_n))$. Like Newton's method this requires 2 function evaluations per step. Steffensen's is implemented through Roots.Steffensen()
.
Inverse quadratic interpolation fits a quadratic polynomial through three points, not just two like the Secant method. The third being $(f(x_2), x_2)$.
For example, here is the inverse quadratic function, $g(y)$, going through three points marked with red dots. The blue dot is found from $(g(0), 0)$.
Here we use SymPy
to identify the polynomial as a function of $y$, then evaluate it at $y=0$ to find the next step:
using SymPy @syms y h[0:2] x[0:2] fn[0:2] H(y) = sum(hᵢ*(y-fn[end])^i for (hᵢ,i) ∈ zip(h, 0:2)) eqs = [Eq(H(fᵢ), xᵢ) for (xᵢ, fᵢ) ∈ zip(x,fn)] ϕ = solve(eqs, h) hy = subs(H(y), ϕ)
The value of hy
at $y=0$ yields the next guess based on the past three, and is given by:
q⁻¹ = hy(y=>0)
Though the above can be simplified quite a bit when computed by hand, here we simply make this a function with lambdify
which we will use below.
λ3 = lambdify(q⁻¹) # fs, then xs
#101 (generic function with 1 method)
(The lambdify
function, by default, picks the order of its argument lexicographically, in this case they will be the f
values then the x
values.)
An inverse quadratic step is utilized by Brent's method, as possible, to yield a rapidly convergent bracketing algorithm implemented as a default zero finder in many software languages. Julia
's Roots
package implements the method in Roots.Brent()
. An inverse cubic interpolation is utilized by Alefeld, Potra, and Shi which gives an asymptotically even more rapidly convergent algorithm then Brent's (implemented in Roots.AlefeldPotraShi()
and also Roots.A42()
). This is used as a finishing step in many cases by the default hybrid Order0()
method of find_zero
.
In a bracketing algorithm, the next step should reduce the size of the bracket, so the next iterate should be inside the current bracket. However, quadratic convergence does not guarantee this to happen. As such, sometimes a subsitute method must be chosen.
Chandrapatlu's method, is a bracketing method utilizing an inverse quadratic step as the centerpiece. The key insight is the test to choose between this inverse quadratic step and a bisection step. This is done in the following based on values of $\xi$ and $\Phi$ defined within:
function chandrapatlu(f, u, v, λ, verbose=false) a,b = promote(float(u), float(v)) fa,fb = f(a),f(b) @assert fa * fb < 0 if abs(fa) < abs(fb) a,b,fa,fb = b,a,fb,fa end c, fc = a, fa maxsteps = 100 for ns in 1:maxsteps Δ = abs(b-a) ϵ = max(eps(a),eps(b)) if Δ < 2ϵ return abs(fa) < abs(fb) ? a : b end abs(fa) < ϵ && return a ξ = (a-b)/(c-b) Φ = (fa-fb)/(fc-fb) if Φ^2 < ξ < 1 - (1-Φ)^2 xt = λ(fa,fc,fb, a,c,b) # inverse quadratic else xt = a + (b-a)/2 end ft = f(xt) isnan(ft) && break if sign(fa) == sign(ft) c,fc = a,fa a,fa = xt,ft else c,b,a = b,a,xt fc,fb,fa = fb,fa,ft end verbose && @show ns, a, fa end error("no convergence: [a,b] = $(sort([a,b]))") end
chandrapatlu (generic function with 2 methods)
Like bisection, this method ensures that $a$ and $b$ is a bracket, but it moves $a$ to the newest estimate, so does not maintain that $a < b$ throughout.
We can see it in action on the sine function. Here we pass in $\lambda$, but in a real implementation we would have programmed the algorithm to compute the inverse quadratic value.
chandrapatlu(sin, 3, 4, λ3, true)
(ns, a, fa) = (1, 3.5, -0.35078322768961984) (ns, a, fa) = (2, 3.1315894157911264, 0.010003070970892524) (ns, a, fa) = (3, 3.141678836157296, -8.618256739611538e-5) (ns, a, fa) = (4, 3.141592600257386, 5.3332407057633926e-8) (ns, a, fa) = (5, 3.1415926535898007, -7.42705188753633e-15) (ns, a, fa) = (6, 3.141592653589793, 1.2246467991473532e-16) 3.141592653589793
The condition Φ^2 < ξ < 1 - (1-Φ)^2
can be visualized. Assume a,b=0,1
, fa,fb=-1/2,1
, Then c < a < b
, and fc
has the same sign as fa
, but what values of fc
will satisfy the inequality?
using ImplicitEquations using Plots ξ(c,fc) = (a-b)/(c-b) Φ(c,fc) = (fa-fb)/(fc-fb) Φl(c,fc) = Φ(c,fc)^2 Φr(c,fc) = 1 - (1-Φ(c,fc))^2 a,b = 0,1 fa,fb = -1/2, 1 r = ImplicitEquations.Lt(Φl, ξ) & ImplicitEquations.Lt(ξ,Φr) plot(r, xlims=(-2,a), ylims=(-3,0))
When (c,fc)
is in the shaded area, the inverse quadratic step is chosen. We can see that fc < fa
is needed.
For these values, this area is within the area where a implicit quadratic step will result in a value between a
and b
:
l(c,fc) = λ3(fa,fb,fc,a,b,c) r = ImplicitEquations.Lt(l,b) & ImplicitEquations.Gt(l,a) plot(r, xlims=(-2,0), ylims=(-3,0))