Banach fixed-point theorem

You may have heard of this theorem in many places, such as applying iteration methods to compute the root of a function. Here we describe the theorem and explain to understand and digest the theorem.

Pre-requests

Theorem

Let \((X, d)\) be a complete metric space and \(f\) be a contraction mapping on \(X\) with Lipschitz constant \(\gamma\). For any element \(x \in X\), denote \(xn\) be the result of iteratively applying \(f\) for \(n\) times

\[\lim_{n \rightarrow \infty} xn = x*\]

The convergence speed is captured by

\[d(x* , xn) \leq \gamma ^n / (1 - \gamma) d(x1, x)\]

Explanation

\((X, d)\) is complete if every Cauchy sequence in \(X\) converges to an element in \(X\). Consider an initial point \(x\). By applying contraction mapping \(f\) once, it’s going to be mapped to \(x1 \in X\). But we prefer to say \(x1 \in X1\) where \(X1\) is the set containing all the mapped elements from \(X\). Intuitively, \(X1\) is denser than \(X\).

As we compress \(X\) to \(X1\), then to \(X2\), …, \(Xn\), set \(Xn\) is much denser than \(X\). Banach therorem says that the extreme of \(Xn\) is a set containing one element \(x*\).

The proof is actually not difficult. We need to prove the uniqueness of \(x*\) and the existance of \(x*\).

Assume there is another extreme point \(x**\), then \(f(x*)=x*\) and \(f(x**) = x**\). If we measure the distance, \(d(f(x*), f(x**) = d(x*, x**)\) ! This violates the definite of contraction mapping with Lipschitz constant \(\gamma \in [0, 1)\), which should have \(d(f(x*), f(x**) < \gamma d(x*, x**)\). So only one extreme point.

For the sequence \(S=x, x1, x2, ..., xn, ...\), we only need to prove that it is a Cauchy sequence, then we prove the existance of the extreme point. The proof is a simple application of the definition of Cauchy sequence. For any two points in the sequence, say \(xm\) and \(xn\), with \(m < n\). The distance between them is

\[\begin{split}\begin{align} d(xm, xn) \leq& d(xm, xm-1) + d(xn, xm-1) \leq \gamma^{m-1}d(x1, x) + \gamma^{n-1} d(x1, x)\\ =& (\gamma^{m-1} +\gamma^{n-1} ) d(x1, x) \end{align}\end{split}\]

\((\gamma^{m-1} +\gamma^{n-1} )\) can be made arbitrarily close to 0 by choosing sufficiently large \(m\) and \(n\). By treating \(d(x1, x)\) as a constant, for any required \(\epsilon\) in the definition of Cauchy sequence, we can always satisfy \(d(xm, xn) < \epsilon\). So the sequence \(S\) is a Cauchy sequence.