Loading [MathJax]/jax/output/HTML-CSS/jax.js

Sunday, July 26, 2015

A New Characterization of the Fourier Transform

The Fourier transform arises in many different ways. Historically, it first arose as a sort of limiting procedure for Fourier series. The goal was to extend the theory of Fourier series to functions which were aperiodic. It arose again in the context of Banach algebras as a special case of what is known as a Gelfand transform. The Fourier transform can also be arrived at by considering the spectral properties of the Laplace operator - this is quite similar in nature to the way it was discovered historically. Perhaps the most elegant approach to establishing the Fourier transform is as a lifted representation of characters on R to the L1(R) algebra. I will discuss this last characterization in an upcoming post.

Characterizing the Fourier Transform

The Gaussian - as any mathematician or physicst knows - plays an instrumental role in mathematics. It arises in the central limit theorem, in Brownian motion, as a Green's function for the heat equation and many other places. Particularly, the Gaussian, denoted g, plays a critical role in the theory of the Fourier transform as it is not only an eigenfunction of the Fourier transform (meaning Fg=g), but it is also the minimizer for the Heisenberg uncertainty product. In the literature, the role the Gaussian plays is viewed as a happy coincidence.

In this post, I give a new way to characterize the Fourier transform. This characterization is not equivalent to any of the ones above and is a bit unconventional in the following way. Typically, an operator is defined and, among other things, its spectral properties analyzed, including its eigenvectors. In this post, I present the following characterization for the Fourier transform. It is the integral transform F with kernel φ which satisfies
  1. Fg=g,
  2. φ(ω,t)=f(ωt) for some complex-valued f,
  3. φ:R2C is real analytic,
  4. If φ=c+is, where c and s are real-valued, then c is even and s is odd,
  5. c and s satisfy the same differential equation, and
  6. F is an isometry when restricted to a dense subspace of L2(R).
In essence, the Gaussian is taken to be the defining characteristic for the Fourier transform; the rest is added to ensure uniqueness.

Deriving the Fourier Transform

Since φ is real analytic, we can express it as a power series which converges everywhere. In our first condition, note that both sides are real-valued. This particularly means that when we integrate s against the Gaussian, we must get zero - else the left hand side would be complex in general. As such, we cannot hope to uncover what s must be from property 1. Moreover, if we added any slowly growing odd function (e.g. ωt) to c, the integration against the Gaussian would be unchanged. Thus to have uniqueness we must require that c be even.

Thus we can restrict our attention to property 1 in the context of only c:

eω22=c(ωt)et22dt.

Writing c(η)=n=0cnη2n (note that we are using the assumption that c is even), we get

eω22=n=0cn(ωt)2net22dt.

For now let us forsake rigor and simply interchange integral and summation:

eω22=n=02cnω2n0t2net22dt.

Making a change of variable z=t22, this becomes

eω22=n=02cnω2n0((2z)12)2n1ezdz=n=02n+12cnω2n0zn12ezdz.

This integral can be immediately recognized as the gamma function evaluated at n+12. One of the key properties of the gamma function is that for natural numbers n,

Γ(n+12)=(2n)!π22nn!.

Our expression then becomes

eω22=2πn=0(2n)!cn2nn!ω2n.

Since eω22 is an analytic function, we can write it as a power series and thus equate coefficients on both sides of the above equation:

n=0(1)n2nn!ω2n=2πn=0(2n)!cn2nn!ω2n.

Hence cn=(1)n(2n)!, which gives

c(η)=n=0(1)n(2n)!η2n=12πcos(η)

as desired. At this point, an application of Fubini-Tonelli justifies interchanging our limiting procedures. Alternatively, it can be deduced via uniform convergence of the power series for c.

Now that we have correctly deduced c, we are only tasked with determining s. To do so, we must consider what differential equation(s) c solves. The most obvious differential equation that cos(η) solves is

d2dη2cos(η)=cos(η).

As such, we wish to find the odd solution to the equation

d2dη2f=f

to determine s. From basic differential equation theory, it is clear that f(η)=Csin(η), where C is to be determined. The way by which to determine C is by considering condition 6. For our purposes, we need only to pick an odd function in L2(R). The simplest such function is f(t)=tet22.

Computing Ff, we get

Ff(ω)=Cisin(ωt)tet22dt=Cieiωteiωt2itet22dt=C2tet22+iωtdtC2tet22iωtdt

Employing the standard completing the square trick, we can rewrite these as t22+iωt=t22+iωt+ω22ω22=(t2iω2)2ω22 and similarly t22iωt=t22iωt+ω22ω22=(t2+iω2)2ω22. Our integrals then become

Ff(ω)=C2eω22(te(tiω2)2dtte(t+iω2)2dt)

Naively, one would make a change of variable of z=t±iω but then our integral get shifted to one in the complex plane. To mitigate that, we simply note that our integrands are entire functions and thus have zero residues. So if we made contour integrals which were boxes with segments [R,R], [R±iω2,Riω2] (and their adjoining vertical segments), we would get a value of zero.

It is not hard to argue that the value of the contour integral along the vertical segments goes to zero as R tends to infinity, thus the integral of our function along the real axis is equal to its integral on the shifted axis after a change of variable. This justifies the naive change of variable.

As such, we are left with evaluating

Ff(ω)=C2eω22((t+iω)et22dt(tiω)et22dt).

Making use of the oddness of tet22, this simplifies immediately to

Ff(ω)=iCωeω22et22dt=i2πCωeω22.

Since f is actually an eigenfunction of F with eigenvalue iC2π, the only way for F to be an isometry on L2(R) is if C=±12π. Thus we obtain the following functional form for φ:

φ(ωt)=12πe±iωt

and thus the Fourier transform emerges. The ± is to be expected since the Fourier transform is only unique up to a sign in the exponent.

Friday, July 24, 2015

An Axiomatic Approach to the Complex Numbers

One question which comes up quite frequently regarding the complex numbers is: why do we define them and their operations the way we do? Everyone knows the old adage about solving quadratic and cubic equations and setting i2=1, but this doesn't really capture the why of complex numbers in my opinion. In a previous post, I established the complex numbers as the field R[x]/(x2+1). This is a beautiful way to view complex numbers if you happen to know algebra quite well; if you do not, then this is not a very satisfactory characterization. This begs the question: what is the most natural characterization for the complex numbers with the least technical overhead? Recently, I found a characterization that I think captures the essence of complex numbers quite well that should appeal to many people's sensibilities.

In this post, I will go through the characterization and derivation. I have not seen the approach I've taken here as an axiomatic approach to the complex numbers. While it is not novel, it does have a nice simplicity and elegance to it.

Characterizing C

We know from experience that the complex numbers are elements of the form x+iy. Alternatively, we can - and often do - view complex numbers as points in the R2 plane. We associate x+iy with the coordinate pair (x,y). In the coordinate pair representation of complex numbers, complex multiplication is given by (x1,y1)(x2,y2)=(x1x2y1y2,x1y2+x2y1). Our goal is to derive this from reasonable axioms. The coordinate pair representation (i.e. the R2 representation) will be the starting point.

The coordinate plane R2 is in fact a vector space over R. It is a standard exercise for first time linear algebra students to prove this. What this means is that R2 comes equipped with two operations: vector addition and scalar multiplication. These operations must satisfy the usual vector space axioms.

Vector addition is defined as so: (x1,y1)+(x2,y2)=(x1+x2,y1+y2). Scalar multiplication is defined by the expression α(x,y)=(αx,αy). These two serve as part of the platform for defining the complex numbers.

Before we start deriving C, we must first consider how different vector spaces and fields are. Every field is a vector space over itself, so we know that there is some non-trivial relationship between the two. However not every vector space can be turned into a field. What separates vector spaces and fields is that fields carry an additional structure: multiplication. You can of course define multiplication on vector spaces (e.g. (x1,,xn)(y1,,yn)=(x1y1,,xnyn)) but it may not always give a meaningful or useful structure. Every element in a field must have a multiplicative inverse, multiplication must be commutative, and multiplication must distribute over addition. Since we already know that C is somehow constructed from R2, what we are tasked with is deriving the multiplication on C, i.e. C is a fieldification of R2.

Exercise: Check that the multiplication given in the preceding paragraph does not give a field on Rn except when n=1.

Let us now list some observations/assumptions:

  1. We can identify R with the x axis in R2. Meaning that if we multiplied (x1,0) with (x2,0), then we should get (x1x2,0) since this is how we multiply real numbers.
  2. Since R2 is a vector space over R, we can think of the scalar multiplication equation α(x,y)=(αx,αy) as being equivalent to (α,0)(x,y)=(αx,αy).
  3. Since we want a field structure, we need that multiplication is commutative, that is to say that (x1,y1)(x2,y2)=(x2,y2)(x1,y1). Moreover, every nonzero element has a multiplicative inverse.
  4. It's natural to want that if (x1,y1) is a unit vector (in the Euclidean metric, meaning it lies on the unit circle), then (x1,y1)(x2,y2) has the same length as (x2,y2). This is a generalization of the fact that |1x|=|x|=|1x|, i.e. if you multiply x by a unit vector (1 or 1 in R), then its length is unchanged.
The first two properties can be thought of as pushing the algebraic structure of R to R2; the third is a necessary condition for a field structure; the fourth however is a geometric condition which really captures the essence of the complex numbers. Any field extension of R (that is a field which contains R) is going to satisfy the first three properties, but there are many fields that do this. It turns out that they're all isomorphic, but these other fields lack the geometry of complex numbers. Our fairly modest fourth condition is what clinches the deal.

Deriving Complex Multiplication

Since we need our multiplication to distribute over addition, it must be of the form

(x1y1)(x2y2)=(α1x1x2+α2x1y2+α3y1x2+α4y1y2β1x1x2+β2x1y2+β3y1x2+β4y1y2).

The reason for this is that since the map (x2,y2)(x1,y1)(x2,y2) distributes over addition and scalar multiplication (by the field axioms), it is actually a linear map. Every linear map must be of the above form. Prove this yourself if you are not convinced. The constants α1,,α4,β1,,β4 must be independent of (x1,y1) and (x2,y2) in order for multiplication to be well-defined. Thus they are universal to our problem at hand.

Let us consider then what happens when we enforce our first property. If we multiply two "real" numbers (i.e. numbers of the form (x,0)), then our multiplication equation becomes

(x10)(x20)=(α1x1x2β1x1x2)

since y1=0=y2. However we stated that we should get (x1x2,0) when we multiply (x1,0) and (x2,0). This gives that α1=1 and β1=0 by equating our components.

Let us now consider what happens when we impose our second property. Multiplying (x1,0) and (x2,y2) gives

(x10)(x2y2)=(x1x2+α2x1y2β2x1y2).

Our second condition stated that this should give us (x1x2,x1y2). Equating components forces α2=0 and β2=1. With these two properties, we've significantly simplified our general expression for multiplication to the following:

(x1y1)(x2y2)=(x1x2+α3y1x2+α4y1y2x1y2+β3y1x2+β4y1y2).

By imposing our third condition of (x1,y1)(x2,y2)=(x2,y2)(x1,y1), we must have that the following is true:

(x1x2+α3y1x2+α4y1y2x1y2+β3y1x2+β4y1y2)=(x1x2+α3x1y2+α4y1y2x1y2+β3x1y2+β4y1y2).

Again by equating our components, it follows that α3=0 and β3=1 since x1y2 appears on the left hand side so we are left with the following expression for our multiplication:

(x1y1)(x2y2)=(x1x2+α4y1y2x1y2+y1x2+β4y1y2).

At this point, all that is left to determine are α4 and β4. Before narrowing these two down to - hopefully - get C out of the mix, let us see where we stand. The second part of our third condition is that every nonzero element has a multiplicative inverse. Particularly, this means that (x1,y1)(x2,y2)=(0,0) if and only if one or both of (x1,y1) and (x2,y2) are (0,0).

Viewing multiplication by (x1,y1)(0,0) as a linear mapping again, the above argument gives that the kernel of the induced linear map is trivial, i.e. its determinant is nonzero. The linear map in question is of the form

(x1α4y1y1x1+β4y1).

The determinant of this matrix is x21+β4x1y1α4y21. If the matrix is to be invertible, then the determinant can never be zero, i.e. setting x21+β4x1y1α4y21=0 gives no solution. The only way for this to happen is if the discriminant is negative (which corresponds to non-real solutions though this is putting the cart before the horse). Considering x1 to be the dependent variable, he discriminant in this case is β24y214α4y21.

Since y10, we have that β24y214α24y21<0 if and only if β244α24<0. Equivalently, β24<4α24. Picking any such α4 and β4 which satisfy this condition would necessarily give rise to a field structure as we desired. However as stated above, these fields do not capture the geometry we would like to associate to the complex numbers. As such, it is not obvious from the first three axioms what we should pick as the definition of the complex numbers. Enter our fourth property.

Our fourth property - while seemingly very modest - is actually a very strong condition. It forces a lot of geometry into our field. Let us now invoke our fourth property in two different ways. Consider (0,1). This is a unit vector so (0,1)(0,1) should have length one. When we multiply these what we get is (α4,β4). The only way for this vector to have length one is if α24+β24=1, i.e. α4=±1β24.

With a concrete relationship between α4 and β4, we can now sort out what the values must be. Let us multiply (0,1) and (cosθ,sinθ). Both of these are unit vectors so the length of the resulting vector must be one as above. Multiplying the vectors, we get (±1β24sinθ,cosθ+β4sinθ).

The length of this vector is (1β24)sin2θ+cos2θ+2β4sinθcosθ+β24sin2θ which simplifies to 1+β4sin2θ. This can only possibly be 1 if β4=0 or θ=0,π2,3π2. Picking any θ other than those three values, it follows that β4 must be zero since β4 must be universal. Thus α4=±1. It is quite straightforward to show that the only choice for α4 which preserves lengths is α4=1.

Thus the multiplication on R2 we posited reduces to

(x1,y1)(x2,y2)=(x1y1x2y2,x1y2+y1x2)

which is the usual multiplication on C! We also get that (0,1)(0,1)=(1,0) which is a restatement of i2=1. Moreover, in this procedure the matrix representation of complex numbers fell right out quite naturally in our considerations.

The other choices of α4 and β4 give rise to "different" fields, though really they are equivalent to C and the reason is that by virtue of having a negative discriminant, we assured that the polynomial equation that they solve is not solvable over R. Quotienting R[x] by the polynomial would give a degree two field extension which is isomorphic to C.