Newton’s Method and Newton’s Gradient Descent

There are two Newton’s method. They are related but not the same. In this document, we will discuss each of them and help you understand the similarity and differences.

Newton’s Method

Newton’s method is also known as the Newton-Raphson method or the Newton-Fourier method. It’s an iterative numerical method to solve equation \(f(x)=0\).

Begin with an initial value \(x_0\), and iteratively apply

\[x_{t+1} = x_t - f(x_t) / f'(x_t)\]

The sequence of \(x_0, x_1, x_2, \cdots\) may converge, which means the convergence is not guaranteed. Usually, we prefer a good initial starting point which is not too far away from the real solution. Also, applying Newton’s method doesn’t require the existance of the second derivative.

Example

Q: We would like to calculate the square root of \(37\).

We formulate the problem as finding the root of function \(f(x) = x^2 - 37\). Let’s choose a good enough starting point \(x_0 = 6\). By applying the iteration, we will have

\[x_1 = x_0 - f(x_0) / f'(x_0) = 6 - [6^2 -37] / [2 \cdot 6] = 6 + 1/12 = 6.083333.\]

\(6.083333^2 = 37.006944\) which is very close to \(37\). Let’s do another iteration

\[x_2 = x_1 - f(x_1) / f'(x_1) = 6.083333 - [6.083333^2 - 37] / [2 \cdot 6.083333] = 6.082762\]

\(6.082762^2 = 37.0000003\) which is much more closer.

Why it may work?

By approximating the target function using the first order of Taylor’s expansion, we have

\[f(x_{t+1}) \approx f(x_t) + f'(x_t)(x_{t+1} - x_t) = 0\]

Clean the equation, we have

\[x_{t+1} = x_t - f(x_t) / f'(x_t)\]

Newton’s Gradient Descent

Newton’s Gradient Descent is to iteratively solve the minimum of an optimazation function. The sufficient condition of a minimum is that its gradient is zero. So, we use Newton’s Method (the method we described above) to solve the point that allows the gradient value to be zero. Now using a mathematical language to reformulate the problem.

Define function \(f(x)\) as the objective function to be minimized. We need to find a point \(x^*\) that has \(f'(x^*) = 0\). In order to reduce confusion, we let \(g(x) = f'(x)\), and the objective turns to solve the root of \(g(x)=0\). By applying the Newton’s Method, we have

\[x_{t+1} = x_t - g(x_t) / g'(x_t) = x_t - f'(x_t) / f''(x_t)\]

That’s it.

Of course, we assume function \(f(x)\) is a scalar function.

Comparison

Newton’s method and Newton’s Gradient Descent are very similar. Newton’s method is to solve \(f(x) = 0\) , and Newton’s Gradient Descent method is to solve \(f'(x) = 0\). The underline trick is both first-order approximation of Taylor’s expansion. Both method can be extended to affine functions, just be replacing the division to matrix inverse (if the matrix has an inverse).