Sunday, July 26, 2015

A New Characterization of the Fourier Transform

The Fourier transform arises in many different ways. Historically, it first arose as a sort of limiting procedure for Fourier series. The goal was to extend the theory of Fourier series to functions which were aperiodic. It arose again in the context of Banach algebras as a special case of what is known as a Gelfand transform. The Fourier transform can also be arrived at by considering the spectral properties of the Laplace operator - this is quite similar in nature to the way it was discovered historically. Perhaps the most elegant approach to establishing the Fourier transform is as a lifted representation of characters on $\mathbb{R}$ to the $L^1(\mathbb{R})$ algebra. I will discuss this last characterization in an upcoming post.

Characterizing the Fourier Transform

The Gaussian - as any mathematician or physicst knows - plays an instrumental role in mathematics. It arises in the central limit theorem, in Brownian motion, as a Green's function for the heat equation and many other places. Particularly, the Gaussian, denoted $g$, plays a critical role in the theory of the Fourier transform as it is not only an eigenfunction of the Fourier transform (meaning $\mathcal{F}g=g$), but it is also the minimizer for the Heisenberg uncertainty product. In the literature, the role the Gaussian plays is viewed as a happy coincidence.

In this post, I give a new way to characterize the Fourier transform. This characterization is not equivalent to any of the ones above and is a bit unconventional in the following way. Typically, an operator is defined and, among other things, its spectral properties analyzed, including its eigenvectors. In this post, I present the following characterization for the Fourier transform. It is the integral transform $\mathcal{F}$ with kernel $\varphi$ which satisfies
  1. $\mathcal{F}g = g$,
  2. $\varphi(\omega, t) = f(\omega t)$ for some complex-valued $f$,
  3. $\varphi:\mathbb{R}^2\to\mathbb{C}$ is real analytic,
  4. If $\varphi = c + is$, where $c$ and $s$ are real-valued, then $c$ is even and $s$ is odd,
  5. $c$ and $s$ satisfy the same differential equation, and
  6. $\mathcal{F}$ is an isometry when restricted to a dense subspace of $L^2(\mathbb{R})$.
In essence, the Gaussian is taken to be the defining characteristic for the Fourier transform; the rest is added to ensure uniqueness.

Deriving the Fourier Transform

Since $\varphi$ is real analytic, we can express it as a power series which converges everywhere. In our first condition, note that both sides are real-valued. This particularly means that when we integrate $s$ against the Gaussian, we must get zero - else the left hand side would be complex in general. As such, we cannot hope to uncover what $s$ must be from property 1. Moreover, if we added any slowly growing odd function (e.g. $\omega t$) to $c$, the integration against the Gaussian would be unchanged. Thus to have uniqueness we must require that $c$ be even.

Thus we can restrict our attention to property 1 in the context of only $c$:

$$e^{-\frac{\omega^2}{2}} = \int_{-\infty}^{\infty} c(\omega t) e^{-\frac{t^2}{2}}\,dt.$$

Writing $c(\eta) = \sum_{n=0}^{\infty} c_n \eta^{2n}$ (note that we are using the assumption that $c$ is even), we get

$$e^{-\frac{\omega^2}{2}} = \int_{-\infty}^{\infty} \sum_{n=0}^{\infty} c_n(\omega t)^{2n} e^{-\frac{t^2}{2}}\,dt.$$

For now let us forsake rigor and simply interchange integral and summation:

$$e^{-\frac{\omega^2}{2}} = \sum_{n=0}^{\infty} 2 c_n \omega^{2n}\int_0^{\infty} t^{2n}e^{-\frac{t^2}{2}}\,dt.$$

Making a change of variable $z = \frac{t^2}{2}$, this becomes

$$e^{-\frac{\omega^2}{2}} = \sum_{n=0}^{\infty} 2 c_n \omega^{2n}\int_0^{\infty} ((2z)^{\frac{1}{2}})^{2n-1} e^{-z}\,dz = \sum_{n=0}^{\infty}2^{n+\frac{1}{2}} c_n\omega^{2n} \int_0^{\infty} z^{n-\frac{1}{2}} e^{-z}\,dz.$$

This integral can be immediately recognized as the gamma function evaluated at $n+\frac{1}{2}$. One of the key properties of the gamma function is that for natural numbers $n$,

$$\Gamma\left(n+\frac{1}{2}\right) = \frac{(2n)!\sqrt{\pi}}{2^{2n} n!}.$$

Our expression then becomes

$$e^{-\frac{\omega^2}{2}}=\sqrt{2\pi}\sum_{n=0}^{\infty}\frac{(2n)!c_n}{2^nn!} \omega^{2n}.$$

Since $e^{-\frac{\omega^2}{2}}$ is an analytic function, we can write it as a power series and thus equate coefficients on both sides of the above equation:

$$\sum_{n=0}^{\infty} \frac{(-1)^n}{2^n n!}\omega^{2n} = \sqrt{2\pi}\sum_{n=0}^{\infty} \frac{(2n)!c_n}{2^nn!} \omega^{2n}.$$

Hence $c_n = \frac{(-1)^n}{(2n)!}$, which gives

$$c(\eta) = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n)!}\eta^{2n} = \frac{1}{\sqrt{2\pi}}\cos(\eta)$$

as desired. At this point, an application of Fubini-Tonelli justifies interchanging our limiting procedures. Alternatively, it can be deduced via uniform convergence of the power series for $c$.

Now that we have correctly deduced $c$, we are only tasked with determining $s$. To do so, we must consider what differential equation(s) $c$ solves. The most obvious differential equation that $\cos(\eta)$ solves is

$$\frac{d^2}{d\eta^2}\cos(\eta) = -\cos(\eta).$$

As such, we wish to find the odd solution to the equation

$$\frac{d^2}{d\eta^2} f = -f$$

to determine $s$. From basic differential equation theory, it is clear that $f(\eta) = C\sin(\eta)$, where $C$ is to be determined. The way by which to determine $C$ is by considering condition 6. For our purposes, we need only to pick an odd function in $L^2(\mathbb{R})$. The simplest such function is $f(t) = te^{-\frac{t^2}{2}}$.

Computing $\mathcal{F}f$, we get

\begin{eqnarray*}
\mathcal{F}f(\omega) &=& Ci\int_{-\infty}^{\infty} \sin(\omega t)t e^{-\frac{t^2}{2}}\,dt \\
&=& Ci\int_{-\infty}^{\infty} \frac{e^{i\omega t} - e^{-i\omega t}}{2i} t e^{-\frac{t^2}{2}}\,dt \\
&=& \frac{C}{2}\int_{-\infty}^{\infty} te^{-\frac{t^2}{2}+i\omega t}\,dt - \frac{C}{2}\int_{-\infty}^{\infty} t e^{-\frac{t^2}{2} - i\omega t}\,dt
\end{eqnarray*}

Employing the standard completing the square trick, we can rewrite these as $-\frac{t^2}{2} + i\omega t = -\frac{t^2}{2} + i\omega t + \frac{\omega^2}{2} - \frac{\omega^2}{2} = -\left(\frac{t}{\sqrt{2}} -i\frac{\omega}{\sqrt{2}}\right)^2 - \frac{\omega^2}{2}$ and similarly $-\frac{t^2}{2} - i\omega t = -\frac{t^2}{2} - i\omega t + \frac{\omega^2}{2} - \frac{\omega^2}{2} = -\left(\frac{t}{\sqrt{2}} + i \frac{\omega}{\sqrt{2}}\right)^2 - \frac{\omega^2}{2}$. Our integrals then become

$$\mathcal{F}f(\omega) = \frac{C}{2}e^{-\frac{\omega^2}{2}} \left(\int_{-\infty}^{\infty} t e^{-\left(\frac{t-i\omega}{\sqrt{2}}\right)^2}\,dt - \int_{-\infty}^{\infty} t e^{-\left(\frac{t+i\omega}{\sqrt{2}}\right)^2}\,dt\right)$$

Naively, one would make a change of variable of $z = t\pm i\omega$ but then our integral get shifted to one in the complex plane. To mitigate that, we simply note that our integrands are entire functions and thus have zero residues. So if we made contour integrals which were boxes with segments $[-R,R]$, $[-R\pm i\frac{\omega}{\sqrt{2}}, -R\mp i\frac{\omega}{\sqrt{2}}]$ (and their adjoining vertical segments), we would get a value of zero.

It is not hard to argue that the value of the contour integral along the vertical segments goes to zero as $R$ tends to infinity, thus the integral of our function along the real axis is equal to its integral on the shifted axis after a change of variable. This justifies the naive change of variable.

As such, we are left with evaluating

$$\mathcal{F}f(\omega) = \frac{C}{2}e^{-\frac{\omega^2}{2}}\left(\int_{-\infty}^{\infty} (t+i\omega) e^{-\frac{t^2}{2}}\,dt - \int_{-\infty}^{\infty} (t - i\omega) e^{-\frac{t^2}{2}}\,dt \right).$$

Making use of the oddness of $te^{-\frac{t^2}{2}}$, this simplifies immediately to

$$\mathcal{F}f(\omega) = -iC\omega e^{-\frac{\omega^2}{2}} \int_{-\infty}^{\infty} e^{-\frac{t^2}{2}}\,dt = -i\sqrt{2\pi}C\omega e^{-\frac{\omega^2}{2}}.$$

Since $f$ is actually an eigenfunction of $\mathcal{F}$ with eigenvalue $-iC\sqrt{2\pi}$, the only way for $\mathcal{F}$ to be an isometry on $L^2(\mathbb{R})$ is if $C = \pm \frac{1}{\sqrt{2\pi}}$. Thus we obtain the following functional form for $\varphi$:

$$\varphi(\omega t) = \frac{1}{\sqrt{2\pi}} e^{\pm i \omega t}$$

and thus the Fourier transform emerges. The $\pm$ is to be expected since the Fourier transform is only unique up to a sign in the exponent.

Friday, July 24, 2015

An Axiomatic Approach to the Complex Numbers

One question which comes up quite frequently regarding the complex numbers is: why do we define them and their operations the way we do? Everyone knows the old adage about solving quadratic and cubic equations and setting $i^2 = -1$, but this doesn't really capture the why of complex numbers in my opinion. In a previous post, I established the complex numbers as the field $\mathbb{R}[x]/(x^2+1)$. This is a beautiful way to view complex numbers if you happen to know algebra quite well; if you do not, then this is not a very satisfactory characterization. This begs the question: what is the most natural characterization for the complex numbers with the least technical overhead? Recently, I found a characterization that I think captures the essence of complex numbers quite well that should appeal to many people's sensibilities.

In this post, I will go through the characterization and derivation. I have not seen the approach I've taken here as an axiomatic approach to the complex numbers. While it is not novel, it does have a nice simplicity and elegance to it.

Characterizing $\mathbb{C}$

We know from experience that the complex numbers are elements of the form $x+iy$. Alternatively, we can - and often do - view complex numbers as points in the $\mathbb{R}^2$ plane. We associate $x+iy$ with the coordinate pair $(x,y)$. In the coordinate pair representation of complex numbers, complex multiplication is given by $(x_1,y_1)(x_2,y_2) = (x_1x_2 - y_1y_2, x_1y_2 + x_2y_1)$. Our goal is to derive this from reasonable axioms. The coordinate pair representation (i.e. the $\mathbb{R}^2$ representation) will be the starting point.

The coordinate plane $\mathbb{R}^2$ is in fact a vector space over $\mathbb{R}$. It is a standard exercise for first time linear algebra students to prove this. What this means is that $\mathbb{R}^2$ comes equipped with two operations: vector addition and scalar multiplication. These operations must satisfy the usual vector space axioms.

Vector addition is defined as so: $(x_1,y_1) + (x_2,y_2) = (x_1+x_2,y_1+y_2)$. Scalar multiplication is defined by the expression $\alpha(x,y) = (\alpha x,\alpha y)$. These two serve as part of the platform for defining the complex numbers.

Before we start deriving $\mathbb{C}$, we must first consider how different vector spaces and fields are. Every field is a vector space over itself, so we know that there is some non-trivial relationship between the two. However not every vector space can be turned into a field. What separates vector spaces and fields is that fields carry an additional structure: multiplication. You can of course define multiplication on vector spaces (e.g. $(x_1,\ldots, x_n)(y_1,\ldots, y_n) = (x_1y_1,\ldots,x_ny_n)$) but it may not always give a meaningful or useful structure. Every element in a field must have a multiplicative inverse, multiplication must be commutative, and multiplication must distribute over addition. Since we already know that $\mathbb{C}$ is somehow constructed from $\mathbb{R}^2$, what we are tasked with is deriving the multiplication on $\mathbb{C}$, i.e. $\mathbb{C}$ is a fieldification of $\mathbb{R}^2$.

Exercise: Check that the multiplication given in the preceding paragraph does not give a field on $\mathbb{R}^n$ except when $n=1$.

Let us now list some observations/assumptions:

  1. We can identify $\mathbb{R}$ with the $x$ axis in $\mathbb{R}^2$. Meaning that if we multiplied $(x_1,0)$ with $(x_2,0)$, then we should get $(x_1x_2,0)$ since this is how we multiply real numbers.
  2. Since $\mathbb{R}^2$ is a vector space over $\mathbb{R}$, we can think of the scalar multiplication equation $\alpha(x,y) = (\alpha x,\alpha y)$ as being equivalent to $(\alpha, 0)(x,y) = (\alpha x,\alpha y)$.
  3. Since we want a field structure, we need that multiplication is commutative, that is to say that $(x_1,y_1)(x_2,y_2) = (x_2,y_2)(x_1,y_1)$. Moreover, every nonzero element has a multiplicative inverse.
  4. It's natural to want that if $(x_1,y_1)$ is a unit vector (in the Euclidean metric, meaning it lies on the unit circle), then $(x_1,y_1)(x_2,y_2)$ has the same length as $(x_2,y_2)$. This is a generalization of the fact that $|1\cdot x| = |x| = |-1 \cdot x|$, i.e. if you multiply $x$ by a unit vector ($1$ or $-1$ in $\mathbb{R}$), then its length is unchanged.
The first two properties can be thought of as pushing the algebraic structure of $\mathbb{R}$ to $\mathbb{R}^2$; the third is a necessary condition for a field structure; the fourth however is a geometric condition which really captures the essence of the complex numbers. Any field extension of $\mathbb{R}$ (that is a field which contains $\mathbb{R}$) is going to satisfy the first three properties, but there are many fields that do this. It turns out that they're all isomorphic, but these other fields lack the geometry of complex numbers. Our fairly modest fourth condition is what clinches the deal.

Deriving Complex Multiplication

Since we need our multiplication to distribute over addition, it must be of the form

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} \alpha_1 x_1x_2 + \alpha_2 x_1 y_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ \beta_1 x_1x_2 + \beta_2 x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right).$$

The reason for this is that since the map $(x_2,y_2)\mapsto(x_1,y_1)(x_2,y_2)$ distributes over addition and scalar multiplication (by the field axioms), it is actually a linear map. Every linear map must be of the above form. Prove this yourself if you are not convinced. The constants $\alpha_1,\ldots,\alpha_4,\beta_1,\ldots,\beta_4$ must be independent of $(x_1,y_1)$ and $(x_2,y_2)$ in order for multiplication to be well-defined. Thus they are universal to our problem at hand.

Let us consider then what happens when we enforce our first property. If we multiply two "real" numbers (i.e. numbers of the form $(x,0)$), then our multiplication equation becomes

$$\left(\begin{array}{c} x_1 \\ 0 \end{array}\right)\left(\begin{array}{c} x_2 \\ 0\end{array}\right) = \left(\begin{array}{c} \alpha_1 x_1x_2 \\ \beta_1 x_1x_2\end{array}\right)$$

since $y_1 = 0 = y_2$. However we stated that we should get $(x_1x_2,0)$ when we multiply $(x_1,0)$ and $(x_2,0)$. This gives that $\alpha_1 = 1$ and $\beta_1 = 0$ by equating our components.

Let us now consider what happens when we impose our second property. Multiplying $(x_1,0)$ and $(x_2,y_2)$ gives

$$\left(\begin{array}{c} x_1 \\ 0\end{array}\right)\left(\begin{array}{c} x_2 \\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_2 x_1y_2 \\ \beta_2 x_1y_2\end{array}\right).$$

Our second condition stated that this should give us $(x_1 x_2, x_1y_2)$. Equating components forces $\alpha_2 = 0$ and $\beta_2 = 1$. With these two properties, we've significantly simplified our general expression for multiplication to the following:

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 +  \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right).$$

By imposing our third condition of $(x_1,y_1)(x_2,y_2) = (x_2,y_2)(x_1,y_1)$, we must have that the following is true:

$$\left(\begin{array}{c} x_1x_2 +  \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right) = \left(\begin{array}{c} x_1x_2 + \alpha_3 x_1y_2 + \alpha_4 y_1y_2 \\ x_1y_2 + \beta_3 x_1y_2 + \beta_4 y_1y_2\end{array}\right).$$

Again by equating our components, it follows that $\alpha_3 = 0$ and $\beta_3 = 1$ since $x_1y_2$ appears on the left hand side so we are left with the following expression for our multiplication:

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 +  \alpha_4 y_1y_2 \\ x_1y_2 + y_1x_2 + \beta_4 y_1 y_2\end{array}\right).$$

At this point, all that is left to determine are $\alpha_4$ and $\beta_4$. Before narrowing these two down to - hopefully - get $\mathbb{C}$ out of the mix, let us see where we stand. The second part of our third condition is that every nonzero element has a multiplicative inverse. Particularly, this means that $(x_1,y_1)(x_2,y_2) = (0,0)$ if and only if one or both of $(x_1,y_1)$ and $(x_2,y_2)$ are $(0,0)$.

Viewing multiplication by $(x_1,y_1) \neq (0,0)$ as a linear mapping again, the above argument gives that the kernel of the induced linear map is trivial, i.e. its determinant is nonzero. The linear map in question is of the form

$$\left(\begin{array}{cc} x_1 & \alpha_4 y_1 \\ y_1 & x_1 + \beta_4 y_1\end{array}\right).$$

The determinant of this matrix is $x_1^2 + \beta_4 x_1y_1 - \alpha_4 y_1^2$. If the matrix is to be invertible, then the determinant can never be zero, i.e. setting $x_1^2 + \beta_4 x_1y_1 - \alpha_4 y_1^2 = 0$ gives no solution. The only way for this to happen is if the discriminant is negative (which corresponds to non-real solutions though this is putting the cart before the horse). Considering $x_1$ to be the dependent variable, he discriminant in this case is $\beta_4^2 y_1^2 - 4\alpha_4 y_1^2$.

Since $y_1\neq 0$, we have that $\beta_4^2y_1^2 - 4\alpha_4^2y_1^2 < 0$ if and only if $\beta_4^2 -4\alpha_4^2 < 0$. Equivalently, $\beta_4^2 < 4\alpha_4^2$. Picking any such $\alpha_4$ and $\beta_4$ which satisfy this condition would necessarily give rise to a field structure as we desired. However as stated above, these fields do not capture the geometry we would like to associate to the complex numbers. As such, it is not obvious from the first three axioms what we should pick as the definition of the complex numbers. Enter our fourth property.

Our fourth property - while seemingly very modest - is actually a very strong condition. It forces a lot of geometry into our field. Let us now invoke our fourth property in two different ways. Consider $(0,1)$. This is a unit vector so $(0,1)(0,1)$ should have length one. When we multiply these what we get is $(\alpha_4, \beta_4)$. The only way for this vector to have length one is if $\alpha_4^2 + \beta_4^2 = 1$, i.e. $\alpha_4 = \pm \sqrt{1-\beta_4^2}$.

With a concrete relationship between $\alpha_4$ and $\beta_4$, we can now sort out what the values must be. Let us multiply $(0,1)$ and $(\cos\theta,\sin\theta)$. Both of these are unit vectors so the length of the resulting vector must be one as above. Multiplying the vectors, we get $\left(\pm\sqrt{1-\beta_4^2}\sin\theta, \cos\theta + \beta_4\sin\theta\right)$.

The length of this vector is $(1-\beta_4^2)\sin^2\theta + \cos^2\theta + 2\beta_4\sin\theta\cos\theta + \beta_4^2\sin^2\theta$ which simplifies to $1 + \beta_4\sin2\theta$. This can only possibly be $1$ if $\beta_4 = 0$ or $\theta = 0,\frac{\pi}{2},\frac{3\pi}{2}$. Picking any $\theta$ other than those three values, it follows that $\beta_4$ must be zero since $\beta_4$ must be universal. Thus $\alpha_4 = \pm 1$. It is quite straightforward to show that the only choice for $\alpha_4$ which preserves lengths is $\alpha_4 = -1$.

Thus the multiplication on $\mathbb{R}^2$ we posited reduces to

$$(x_1,y_1)(x_2,y_2) = (x_1y_1 - x_2y_2, x_1y_2 + y_1x_2)$$

which is the usual multiplication on $\mathbb{C}$! We also get that $(0,1)(0,1) = (-1,0)$ which is a restatement of $i^2 = -1$. Moreover, in this procedure the matrix representation of complex numbers fell right out quite naturally in our considerations.

The other choices of $\alpha_4$ and $\beta_4$ give rise to "different" fields, though really they are equivalent to $\mathbb{C}$ and the reason is that by virtue of having a negative discriminant, we assured that the polynomial equation that they solve is not solvable over $\mathbb{R}$. Quotienting $\mathbb{R}[x]$ by the polynomial would give a degree two field extension which is isomorphic to $\mathbb{C}$.