Mathematics Abound: An Axiomatic Approach to the Complex Numbers

One question which comes up quite frequently regarding the complex numbers is: why do we define them and their operations the way we do? Everyone knows the old adage about solving quadratic and cubic equations and setting $i^2 = -1$, but this doesn't really capture the why of complex numbers in my opinion. In a previous post, I established the complex numbers as the field $\mathbb{R}[x]/(x^2+1)$. This is a beautiful way to view complex numbers if you happen to know algebra quite well; if you do not, then this is not a very satisfactory characterization. This begs the question: what is the most natural characterization for the complex numbers with the least technical overhead? Recently, I found a characterization that I think captures the essence of complex numbers quite well that should appeal to many people's sensibilities.

In this post, I will go through the characterization and derivation. I have not seen the approach I've taken here as an axiomatic approach to the complex numbers. While it is not novel, it does have a nice simplicity and elegance to it.

Characterizing $\mathbb{C}$

We know from experience that the complex numbers are elements of the form $x+iy$. Alternatively, we can - and often do - view complex numbers as points in the $\mathbb{R}^2$ plane. We associate $x+iy$ with the coordinate pair $(x,y)$. In the coordinate pair representation of complex numbers, complex multiplication is given by $(x_1,y_1)(x_2,y_2) = (x_1x_2 - y_1y_2, x_1y_2 + x_2y_1)$. Our goal is to derive this from reasonable axioms. The coordinate pair representation (i.e. the $\mathbb{R}^2$ representation) will be the starting point.

The coordinate plane $\mathbb{R}^2$ is in fact a vector space over $\mathbb{R}$. It is a standard exercise for first time linear algebra students to prove this. What this means is that $\mathbb{R}^2$ comes equipped with two operations: vector addition and scalar multiplication. These operations must satisfy the usual vector space axioms.

Vector addition is defined as so: $(x_1,y_1) + (x_2,y_2) = (x_1+x_2,y_1+y_2)$. Scalar multiplication is defined by the expression $\alpha(x,y) = (\alpha x,\alpha y)$. These two serve as part of the platform for defining the complex numbers.

Before we start deriving $\mathbb{C}$, we must first consider how different vector spaces and fields are. Every field is a vector space over itself, so we know that there is some non-trivial relationship between the two. However not every vector space can be turned into a field. What separates vector spaces and fields is that fields carry an additional structure: multiplication. You can of course define multiplication on vector spaces (e.g. $(x_1,\ldots, x_n)(y_1,\ldots, y_n) = (x_1y_1,\ldots,x_ny_n)$) but it may not always give a meaningful or useful structure. Every element in a field must have a multiplicative inverse, multiplication must be commutative, and multiplication must distribute over addition. Since we already know that $\mathbb{C}$ is somehow constructed from $\mathbb{R}^2$, what we are tasked with is deriving the multiplication on $\mathbb{C}$, i.e. $\mathbb{C}$ is a fieldification of $\mathbb{R}^2$.

Exercise: Check that the multiplication given in the preceding paragraph does not give a field on $\mathbb{R}^n$ except when $n=1$.

Let us now list some observations/assumptions:

We can identify $\mathbb{R}$ with the $x$ axis in $\mathbb{R}^2$. Meaning that if we multiplied $(x_1,0)$ with $(x_2,0)$, then we should get $(x_1x_2,0)$ since this is how we multiply real numbers.
Since $\mathbb{R}^2$ is a vector space over $\mathbb{R}$, we can think of the scalar multiplication equation $\alpha(x,y) = (\alpha x,\alpha y)$ as being equivalent to $(\alpha, 0)(x,y) = (\alpha x,\alpha y)$.
Since we want a field structure, we need that multiplication is commutative, that is to say that $(x_1,y_1)(x_2,y_2) = (x_2,y_2)(x_1,y_1)$. Moreover, every nonzero element has a multiplicative inverse.
It's natural to want that if $(x_1,y_1)$ is a unit vector (in the Euclidean metric, meaning it lies on the unit circle), then $(x_1,y_1)(x_2,y_2)$ has the same length as $(x_2,y_2)$. This is a generalization of the fact that $|1\cdot x| = |x| = |-1 \cdot x|$, i.e. if you multiply $x$ by a unit vector ($1$ or $-1$ in $\mathbb{R}$), then its length is unchanged.

The first two properties can be thought of as pushing the algebraic structure of $\mathbb{R}$ to $\mathbb{R}^2$; the third is a necessary condition for a field structure; the fourth however is a geometric condition which really captures the essence of the complex numbers. Any field extension of $\mathbb{R}$ (that is a field which contains $\mathbb{R}$) is going to satisfy the first three properties, but there are many fields that do this. It turns out that they're all isomorphic, but these other fields lack the geometry of complex numbers. Our fairly modest fourth condition is what clinches the deal.

Deriving Complex Multiplication

Since we need our multiplication to distribute over addition, it must be of the form

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} \alpha_1 x_1x_2 + \alpha_2 x_1 y_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ \beta_1 x_1x_2 + \beta_2 x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right).$$

The reason for this is that since the map $(x_2,y_2)\mapsto(x_1,y_1)(x_2,y_2)$ distributes over addition and scalar multiplication (by the field axioms), it is actually a linear map. Every linear map must be of the above form. Prove this yourself if you are not convinced. The constants $\alpha_1,\ldots,\alpha_4,\beta_1,\ldots,\beta_4$ must be independent of $(x_1,y_1)$ and $(x_2,y_2)$ in order for multiplication to be well-defined. Thus they are universal to our problem at hand.

Let us consider then what happens when we enforce our first property. If we multiply two "real" numbers (i.e. numbers of the form $(x,0)$), then our multiplication equation becomes

$$\left(\begin{array}{c} x_1 \\ 0 \end{array}\right)\left(\begin{array}{c} x_2 \\ 0\end{array}\right) = \left(\begin{array}{c} \alpha_1 x_1x_2 \\ \beta_1 x_1x_2\end{array}\right)$$

since $y_1 = 0 = y_2$. However we stated that we should get $(x_1x_2,0)$ when we multiply $(x_1,0)$ and $(x_2,0)$. This gives that $\alpha_1 = 1$ and $\beta_1 = 0$ by equating our components.

Let us now consider what happens when we impose our second property. Multiplying $(x_1,0)$ and $(x_2,y_2)$ gives

$$\left(\begin{array}{c} x_1 \\ 0\end{array}\right)\left(\begin{array}{c} x_2 \\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_2 x_1y_2 \\ \beta_2 x_1y_2\end{array}\right).$$

Our second condition stated that this should give us $(x_1 x_2, x_1y_2)$. Equating components forces $\alpha_2 = 0$ and $\beta_2 = 1$. With these two properties, we've significantly simplified our general expression for multiplication to the following:

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right).$$

By imposing our third condition of $(x_1,y_1)(x_2,y_2) = (x_2,y_2)(x_1,y_1)$, we must have that the following is true:

$$\left(\begin{array}{c} x_1x_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right) = \left(\begin{array}{c} x_1x_2 + \alpha_3 x_1y_2 + \alpha_4 y_1y_2 \\ x_1y_2 + \beta_3 x_1y_2 + \beta_4 y_1y_2\end{array}\right).$$

Again by equating our components, it follows that $\alpha_3 = 0$ and $\beta_3 = 1$ since $x_1y_2$ appears on the left hand side so we are left with the following expression for our multiplication:

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_4 y_1y_2 \\ x_1y_2 + y_1x_2 + \beta_4 y_1 y_2\end{array}\right).$$

At this point, all that is left to determine are $\alpha_4$ and $\beta_4$. Before narrowing these two down to - hopefully - get $\mathbb{C}$ out of the mix, let us see where we stand. The second part of our third condition is that every nonzero element has a multiplicative inverse. Particularly, this means that $(x_1,y_1)(x_2,y_2) = (0,0)$ if and only if one or both of $(x_1,y_1)$ and $(x_2,y_2)$ are $(0,0)$.

Viewing multiplication by $(x_1,y_1) \neq (0,0)$ as a linear mapping again, the above argument gives that the kernel of the induced linear map is trivial, i.e. its determinant is nonzero. The linear map in question is of the form

$$\left(\begin{array}{cc} x_1 & \alpha_4 y_1 \\ y_1 & x_1 + \beta_4 y_1\end{array}\right).$$

The determinant of this matrix is $x_1^2 + \beta_4 x_1y_1 - \alpha_4 y_1^2$. If the matrix is to be invertible, then the determinant can never be zero, i.e. setting $x_1^2 + \beta_4 x_1y_1 - \alpha_4 y_1^2 = 0$ gives no solution. The only way for this to happen is if the discriminant is negative (which corresponds to non-real solutions though this is putting the cart before the horse). Considering $x_1$ to be the dependent variable, he discriminant in this case is $\beta_4^2 y_1^2 - 4\alpha_4 y_1^2$.

Since $y_1\neq 0$, we have that $\beta_4^2y_1^2 - 4\alpha_4^2y_1^2 < 0$ if and only if $\beta_4^2 -4\alpha_4^2 < 0$. Equivalently, $\beta_4^2 < 4\alpha_4^2$. Picking any such $\alpha_4$ and $\beta_4$ which satisfy this condition would necessarily give rise to a field structure as we desired. However as stated above, these fields do not capture the geometry we would like to associate to the complex numbers. As such, it is not obvious from the first three axioms what we should pick as the definition of the complex numbers. Enter our fourth property.

Our fourth property - while seemingly very modest - is actually a very strong condition. It forces a lot of geometry into our field. Let us now invoke our fourth property in two different ways. Consider $(0,1)$. This is a unit vector so $(0,1)(0,1)$ should have length one. When we multiply these what we get is $(\alpha_4, \beta_4)$. The only way for this vector to have length one is if $\alpha_4^2 + \beta_4^2 = 1$, i.e. $\alpha_4 = \pm \sqrt{1-\beta_4^2}$.

With a concrete relationship between $\alpha_4$ and $\beta_4$, we can now sort out what the values must be. Let us multiply $(0,1)$ and $(\cos\theta,\sin\theta)$. Both of these are unit vectors so the length of the resulting vector must be one as above. Multiplying the vectors, we get $\left(\pm\sqrt{1-\beta_4^2}\sin\theta, \cos\theta + \beta_4\sin\theta\right)$.

The length of this vector is $(1-\beta_4^2)\sin^2\theta + \cos^2\theta + 2\beta_4\sin\theta\cos\theta + \beta_4^2\sin^2\theta$ which simplifies to $1 + \beta_4\sin2\theta$. This can only possibly be $1$ if $\beta_4 = 0$ or $\theta = 0,\frac{\pi}{2},\frac{3\pi}{2}$. Picking any $\theta$ other than those three values, it follows that $\beta_4$ must be zero since $\beta_4$ must be universal. Thus $\alpha_4 = \pm 1$. It is quite straightforward to show that the only choice for $\alpha_4$ which preserves lengths is $\alpha_4 = -1$.

Thus the multiplication on $\mathbb{R}^2$ we posited reduces to

$$(x_1,y_1)(x_2,y_2) = (x_1y_1 - x_2y_2, x_1y_2 + y_1x_2)$$

which is the usual multiplication on $\mathbb{C}$! We also get that $(0,1)(0,1) = (-1,0)$ which is a restatement of $i^2 = -1$. Moreover, in this procedure the matrix representation of complex numbers fell right out quite naturally in our considerations.

The other choices of $\alpha_4$ and $\beta_4$ give rise to "different" fields, though really they are equivalent to $\mathbb{C}$ and the reason is that by virtue of having a negative discriminant, we assured that the polynomial equation that they solve is not solvable over $\mathbb{R}$. Quotienting $\mathbb{R}[x]$ by the polynomial would give a degree two field extension which is isomorphic to $\mathbb{C}$.

Mathematics Abound

Friday, July 24, 2015

An Axiomatic Approach to the Complex Numbers

No comments:

Post a Comment