Weierstrass approximation theorem

By | August 12, 2017

In this note, we prove the Weierstrass approximation theorem.

1. Bernstein’s approximation

Theorem. Let f:\left[0,1\right]\rightarrow\mathbb{R} be continuous and let \varepsilon>0 be given. Then there exists a polynomial p\left(x\right) such that \norm{p-f}{\infty}<\varepsilon.

We can construct a polynomial p explicitly. For each n, we define

    \[ p_{n}\left(x\right)=\sum_{k=0}^{n}\binom{n}{k}f\left(\frac{k}{n}\right)x^{k}\left(1-x\right)^{n-k}. \]

We call this polynomial as Bernstein polynomials. The polynomial can be interpreted in terms of probabilistic idea.

For 0\le x\le1, we interpret x as a probability of getting a head of the coin(success) and 1-x as a probability of getting a tail of the coin(fail). In n tosses, the probability of k-success is

    \[ \binom{n}{k}x^{k}\left(1-x\right)^{n-k}. \]

In the polynomial, f\left(\frac{k}{n}\right) has the meaning `f\left(\frac{k}{n}\right) dollars is paid out when exactly k heads turn up when n tosses are made). So the average amount paid out when n tosses are made is

    \[ \sum_{k=0}^{n}\binom{n}{k}f\left(\frac{k}{n}\right)x^{k}\left(1-x\right)^{n-k}. \]

As n\rightarrow\infty, then in a typical game, \frac{k}{n}\rightarrow x. So the average payout converges to f\left(x\right).

This is an intuitive idea, not a rigorous proof. Since most of people don’t know the theory of probability, we give a proof which does not contains any probabilistic interpretation.

Recall

    \[ \left(x+y\right)^{n}=\sum_{k=0}^{n}\binom{n}{k}x^{k}y^{n-k}. \]

Differentiating the identity with respect to x. Then

    \[ nx\left(x+y\right)^{n-1}=\sum_{k=0}^{n}k\binom{n}{k}x^{k}y^{n-k}. \]

Again differentiating the identity with respect to x. Then

    \[ n\left(n-1\right)x^{2}\left(x+y\right)^{n-2}=\sum_{k=0}^{n}k\left(k-1\right)\binom{n}{k}x^{k}y^{n-k}. \]

Let r_{k}\left(x\right)=\binom{n}{k}x^{k}\left(1-x\right)^{n-k}.
Put y=1-x. Then

    \[ \sum_{k=0}^{n}r_{k}\left(x\right)=1,\quad\sum_{k=0}^{n}kr_{k}\left(x\right)=nx,\quad\sum_{k=0}^{n}k\left(k-1\right)r_{k}\left(x\right)=n\left(n-1\right)x^{2}. \]

Hence by computation,

(1)   \begin{equation*} \sum_{k=0}^{n}\left(k-nx\right)^{2}r_{k}\left(x\right)=nx\left(1-x\right). \end{equation*}

Choose M so that \left|f\left(x\right)\right|\le M on \left[0,1\right]. Since f is uniformly continuous, for given \varepsilon, there exists \delta>0 such that \left|x-y\right|<\delta implies \left|f\left(x\right)-f\left(y\right)\right|<\varepsilon.
Now

    \begin{align*} \left|f\left(x\right)-p_{n}\left(x\right)\right| & =\left|f\left(x\right)-\sum_{k=0}^{n}f\left(\frac{k}{n}\right)r_{k}\left(x\right)\right|\\ & \le\sum_{k=0}^{n}\left|f\left(x\right)-f\left(\frac{k}{n}\right)\right|r_{k}\left(x\right). \end{align*}

We divide \left\{ 0,\dots,n\right\} into two parts: (i) \left|k-nx\right|<\delta n (ii) \left|k-nx\right|\ge\delta n.

If \left|k-nx\right|<\delta n, then \left|f\left(x\right)-f\left(\frac{k}{n}\right)\right|<\varepsilon. So in this case, the summation is bounded by \varepsilon since r_{k}\left(x\right)\ge0 and \sum r_{k}=1. For the second type, the corresponding summation is bounded by

    \[ 2M\sum_{k:\left|k-nx\right|\ge\delta n}r_{k}\left(x\right)\le\frac{2M}{n^{2}\delta^{2}}\sum_{k=0}^{n}\left(k-nx\right)^{2}r_{k}\left(x\right) \]

since \left|\left(k-nx\right)/n\delta\right|\ge1. By (1), this is bounded by

    \[ \frac{2M}{n\delta^{2}}\frac{1}{4}\le\frac{M}{2\delta^{2}n} \]

since x\left(1-x\right)\le\frac{1}{4} on \left[0,1\right]. Thus, for every \varepsilon>0, there is a \delta>0 such that

    \[ \norm{f-p_{n}}{\infty}\le\varepsilon+\frac{M}{2\delta^{2}n}. \]

Choose n so that \frac{M}{2\delta^{2}n}<\varepsilon. Then for any k\ge n,

    \[ \norm{f-p_{k}}{\infty}<2\varepsilon. \]

This completes the proof.

2. Landau’s approximation

There is another proof using special `approximation to the identity’. The definition of approximation to the identity is the following:

Definition. We say a family \left\{ K_{n}\right\} _{n=1}^{\infty} is said to be an approximation to the identity if

  • For all n\ge1, \int_{-\infty}^{\infty}K_{n}\left(x\right)dx=1
  • There exists M>0 such that for all n\ge1,

        \[ \int_{-\infty}^{\infty}\left|K_{n}\left(x\right)\right|dx\le M. \]

  • For every \delta>0,

        \[ \int_{\left|x\right|\ge\delta}\left|K_{n}\left(x\right)\right|dx\rightarrow0 \]

    as n\rightarrow\infty.

Roughly speaking, this concept concentrate and localize a function. One motivation of this concept is a dirac delta function. We leave the following theorem as an exercise.

Theorem. Let \left\{ K_{n}\right\} _{n=1}^{\infty} be a family of approximation to the identity. If f is continuous everywhere, then

    \[ \lim_{n\rightarrow\infty}\int_{-\infty}^{\infty}K_{n}\left(x-y\right)f\left(y\right)dy \]

converges uniformly to f\left(x\right).

Hence the above theorem gives an uniform approximation to continuous function. From now on, we denote \left(f*g\right)\left(x\right)=\int_{-\infty}^{\infty}f\left(x-y\right)g\left(y\right)dy.

Define the Landau kernels by

    \[ L_{n}\left(x\right)=\begin{cases} \frac{\left(1-x^{2}\right)^{n}}{c_{n}} & \text{if }-1\le x\le1\\ 0 & \text{if }\left|x\right|\ge1, \end{cases} \]

where c_{n} is chosen so that \int_{-\infty}^{\infty}L_{n}\left(x\right)dx=1.

Then one can check \left\{ L_{n}\right\} is a family of approximation to the identity.

Now we are ready to prove the Weierstrass approximation theorem again. By considering the translation, it suffices to consider a continuous function f supported in \left[-\frac{1}{2},\frac{1}{2}\right].

Since \left\{ L_{n}\right\} is a family of approximation to the identity, \left(L_{n}*f\right)\rightarrow f uniformly. Note that \left(L_{n}*f\right) is a sequence of polynomials on \left[-\frac{1}{2},\frac{1}{2}\right]. This completes the proof of Weierstrass approximation theorem.

Some remarks

First of all, with some a slight modification, Bernstein polynomial can be generalized to multivariable setting. Moreover, the following theorem holds.

Theorem. Let u\left(x\right) be a real-valued function defined on \mathbb{R}^{n} with continuous derivatives u_{x_{i}}, u_{x_{i}x_{j}} on \mathbb{R}^{n} for all i,j=1,\dots,n. Then there exists a sequence of polynomials u^{k}\left(x\right) in x_{1},\dots,x_{n} such that u^{k},u_{x_{i}}^{k},u_{x_{i}x_{j}}^{k} converges to u,u_{x_{i}},u_{x_{i}x_{j}}, respectively, for all i,j=1,\dots,n, uniformly on any compact set.

The proof of this theorem is quite technical. So we omit the proof. One can see the proof e.g. Krylov’s introduction to the theory of diffusion. But this book is not appropriate to those who are taking undergraduate analysis course. Here the proof uses some theory of probability theory, the weak law of large numbers. One may ask why such theorem is needed. This theorem is a quite good lemma for proving some fundamental theorem in the theory of stochastic integrals.

Another way to generalize the theorem in the following sense. Let \mathcal{P} denotes the space of polynomials. The Weierstrass theorem  gives \mathcal{P} is dense in C\left(\left[a,b\right]\right)
with the uniform norm. Actually, the class \mathcal{P} can be regarded as a special case. This was proved by Stone.

A family of \mathscr{A} of real functions defined on a set E is said to be an algebra if (i) f+g\in\mathscr{A}, (ii) fg\in\mathscr{A} and (iii) cf\in\mathscr{A} for all f,g\in\mathscr{A} and for all real constants c.

If \mathscr{A} has the property that f\in\mathscr{A} whenever f_{n}\in\mathscr{A} and f_{n}\rightarrow f uniformly on E, we say \mathscr{A} uniformly closed.

We say \mathscr{A} separate points on E if to every pair of distinct points x_{1},x_{2}\in E, there exists a function f\in\mathscr{A} such that f\left(x_{1}\right)\neq f\left(x_{2}\right).

Note that the class \mathcal{P} clearly satisfies separate points on \mathbb{R}.

Theorem (Stone). Let \mathscr{A} be an algebra of real continuous function on a compact set K. If \mathscr{A} separates points on K and if \mathscr{A} vanishes at no point of K, then the set of all functions which is the limits of uniformly convergent sequence of members of \mathscr{A} consists of all real continuous functions
on K.

In this sense, the theorem holds not only in \mathbb{R} but also \mathbb{R}^{n}. Moreover, the theorem does not depend on Euclidean structure. Those who are interested in its proof, see Rudin’s PMA.

Leave a Reply

Your email address will not be published. Required fields are marked *