Mathematics Abound: 2015

Sunday, July 26, 2015

A New Characterization of the Fourier Transform

The Fourier transform arises in many different ways. Historically, it first arose as a sort of limiting procedure for Fourier series. The goal was to extend the theory of Fourier series to functions which were aperiodic. It arose again in the context of Banach algebras as a special case of what is known as a Gelfand transform. The Fourier transform can also be arrived at by considering the spectral properties of the Laplace operator - this is quite similar in nature to the way it was discovered historically. Perhaps the most elegant approach to establishing the Fourier transform is as a lifted representation of characters on $\mathbb{R}$ to the $L^1(\mathbb{R})$ algebra. I will discuss this last characterization in an upcoming post.

Characterizing the Fourier Transform

The Gaussian - as any mathematician or physicst knows - plays an instrumental role in mathematics. It arises in the central limit theorem, in Brownian motion, as a Green's function for the heat equation and many other places. Particularly, the Gaussian, denoted $g$, plays a critical role in the theory of the Fourier transform as it is not only an eigenfunction of the Fourier transform (meaning $\mathcal{F}g=g$), but it is also the minimizer for the Heisenberg uncertainty product. In the literature, the role the Gaussian plays is viewed as a happy coincidence.

In this post, I give a new way to characterize the Fourier transform. This characterization is not equivalent to any of the ones above and is a bit unconventional in the following way. Typically, an operator is defined and, among other things, its spectral properties analyzed, including its eigenvectors. In this post, I present the following characterization for the Fourier transform. It is the integral transform $\mathcal{F}$ with kernel $\varphi$ which satisfies

$\mathcal{F}g = g$,
$\varphi(\omega, t) = f(\omega t)$ for some complex-valued $f$,
$\varphi:\mathbb{R}^2\to\mathbb{C}$ is real analytic,
If $\varphi = c + is$, where $c$ and $s$ are real-valued, then $c$ is even and $s$ is odd,
$c$ and $s$ satisfy the same differential equation, and
$\mathcal{F}$ is an isometry when restricted to a dense subspace of $L^2(\mathbb{R})$.

In essence, the Gaussian is taken to be the defining characteristic for the Fourier transform; the rest is added to ensure uniqueness.

Deriving the Fourier Transform

Since $\varphi$ is real analytic, we can express it as a power series which converges everywhere. In our first condition, note that both sides are real-valued. This particularly means that when we integrate $s$ against the Gaussian, we must get zero - else the left hand side would be complex in general. As such, we cannot hope to uncover what $s$ must be from property 1. Moreover, if we added any slowly growing odd function (e.g. $\omega t$) to $c$, the integration against the Gaussian would be unchanged. Thus to have uniqueness we must require that $c$ be even.

Thus we can restrict our attention to property 1 in the context of only $c$:

$$e^{-\frac{\omega^2}{2}} = \int_{-\infty}^{\infty} c(\omega t) e^{-\frac{t^2}{2}}\,dt.$$

Writing $c(\eta) = \sum_{n=0}^{\infty} c_n \eta^{2n}$ (note that we are using the assumption that $c$ is even), we get

$$e^{-\frac{\omega^2}{2}} = \int_{-\infty}^{\infty} \sum_{n=0}^{\infty} c_n(\omega t)^{2n} e^{-\frac{t^2}{2}}\,dt.$$

For now let us forsake rigor and simply interchange integral and summation:

$$e^{-\frac{\omega^2}{2}} = \sum_{n=0}^{\infty} 2 c_n \omega^{2n}\int_0^{\infty} t^{2n}e^{-\frac{t^2}{2}}\,dt.$$

Making a change of variable $z = \frac{t^2}{2}$, this becomes

$$e^{-\frac{\omega^2}{2}} = \sum_{n=0}^{\infty} 2 c_n \omega^{2n}\int_0^{\infty} ((2z)^{\frac{1}{2}})^{2n-1} e^{-z}\,dz = \sum_{n=0}^{\infty}2^{n+\frac{1}{2}} c_n\omega^{2n} \int_0^{\infty} z^{n-\frac{1}{2}} e^{-z}\,dz.$$

This integral can be immediately recognized as the gamma function evaluated at $n+\frac{1}{2}$. One of the key properties of the gamma function is that for natural numbers $n$,

$$\Gamma\left(n+\frac{1}{2}\right) = \frac{(2n)!\sqrt{\pi}}{2^{2n} n!}.$$

Our expression then becomes

$$e^{-\frac{\omega^2}{2}}=\sqrt{2\pi}\sum_{n=0}^{\infty}\frac{(2n)!c_n}{2^nn!} \omega^{2n}.$$

Since $e^{-\frac{\omega^2}{2}}$ is an analytic function, we can write it as a power series and thus equate coefficients on both sides of the above equation:

$$\sum_{n=0}^{\infty} \frac{(-1)^n}{2^n n!}\omega^{2n} = \sqrt{2\pi}\sum_{n=0}^{\infty} \frac{(2n)!c_n}{2^nn!} \omega^{2n}.$$

Hence $c_n = \frac{(-1)^n}{(2n)!}$, which gives

$$c(\eta) = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n)!}\eta^{2n} = \frac{1}{\sqrt{2\pi}}\cos(\eta)$$

as desired. At this point, an application of Fubini-Tonelli justifies interchanging our limiting procedures. Alternatively, it can be deduced via uniform convergence of the power series for $c$.

Now that we have correctly deduced $c$, we are only tasked with determining $s$. To do so, we must consider what differential equation(s) $c$ solves. The most obvious differential equation that $\cos(\eta)$ solves is

$$\frac{d^2}{d\eta^2}\cos(\eta) = -\cos(\eta).$$

As such, we wish to find the odd solution to the equation

$$\frac{d^2}{d\eta^2} f = -f$$

to determine $s$. From basic differential equation theory, it is clear that $f(\eta) = C\sin(\eta)$, where $C$ is to be determined. The way by which to determine $C$ is by considering condition 6. For our purposes, we need only to pick an odd function in $L^2(\mathbb{R})$. The simplest such function is $f(t) = te^{-\frac{t^2}{2}}$.

Computing $\mathcal{F}f$, we get

\begin{eqnarray*}
\mathcal{F}f(\omega) &=& Ci\int_{-\infty}^{\infty} \sin(\omega t)t e^{-\frac{t^2}{2}}\,dt \\
&=& Ci\int_{-\infty}^{\infty} \frac{e^{i\omega t} - e^{-i\omega t}}{2i} t e^{-\frac{t^2}{2}}\,dt \\
&=& \frac{C}{2}\int_{-\infty}^{\infty} te^{-\frac{t^2}{2}+i\omega t}\,dt - \frac{C}{2}\int_{-\infty}^{\infty} t e^{-\frac{t^2}{2} - i\omega t}\,dt
\end{eqnarray*}

Employing the standard completing the square trick, we can rewrite these as $-\frac{t^2}{2} + i\omega t = -\frac{t^2}{2} + i\omega t + \frac{\omega^2}{2} - \frac{\omega^2}{2} = -\left(\frac{t}{\sqrt{2}} -i\frac{\omega}{\sqrt{2}}\right)^2 - \frac{\omega^2}{2}$ and similarly $-\frac{t^2}{2} - i\omega t = -\frac{t^2}{2} - i\omega t + \frac{\omega^2}{2} - \frac{\omega^2}{2} = -\left(\frac{t}{\sqrt{2}} + i \frac{\omega}{\sqrt{2}}\right)^2 - \frac{\omega^2}{2}$. Our integrals then become

$$\mathcal{F}f(\omega) = \frac{C}{2}e^{-\frac{\omega^2}{2}} \left(\int_{-\infty}^{\infty} t e^{-\left(\frac{t-i\omega}{\sqrt{2}}\right)^2}\,dt - \int_{-\infty}^{\infty} t e^{-\left(\frac{t+i\omega}{\sqrt{2}}\right)^2}\,dt\right)$$

Naively, one would make a change of variable of $z = t\pm i\omega$ but then our integral get shifted to one in the complex plane. To mitigate that, we simply note that our integrands are entire functions and thus have zero residues. So if we made contour integrals which were boxes with segments $[-R,R]$, $[-R\pm i\frac{\omega}{\sqrt{2}}, -R\mp i\frac{\omega}{\sqrt{2}}]$ (and their adjoining vertical segments), we would get a value of zero.

It is not hard to argue that the value of the contour integral along the vertical segments goes to zero as $R$ tends to infinity, thus the integral of our function along the real axis is equal to its integral on the shifted axis after a change of variable. This justifies the naive change of variable.

As such, we are left with evaluating

$$\mathcal{F}f(\omega) = \frac{C}{2}e^{-\frac{\omega^2}{2}}\left(\int_{-\infty}^{\infty} (t+i\omega) e^{-\frac{t^2}{2}}\,dt - \int_{-\infty}^{\infty} (t - i\omega) e^{-\frac{t^2}{2}}\,dt \right).$$

Making use of the oddness of $te^{-\frac{t^2}{2}}$, this simplifies immediately to

$$\mathcal{F}f(\omega) = -iC\omega e^{-\frac{\omega^2}{2}} \int_{-\infty}^{\infty} e^{-\frac{t^2}{2}}\,dt = -i\sqrt{2\pi}C\omega e^{-\frac{\omega^2}{2}}.$$

Since $f$ is actually an eigenfunction of $\mathcal{F}$ with eigenvalue $-iC\sqrt{2\pi}$, the only way for $\mathcal{F}$ to be an isometry on $L^2(\mathbb{R})$ is if $C = \pm \frac{1}{\sqrt{2\pi}}$. Thus we obtain the following functional form for $\varphi$:

$$\varphi(\omega t) = \frac{1}{\sqrt{2\pi}} e^{\pm i \omega t}$$

and thus the Fourier transform emerges. The $\pm$ is to be expected since the Fourier transform is only unique up to a sign in the exponent.

Friday, July 24, 2015

An Axiomatic Approach to the Complex Numbers

One question which comes up quite frequently regarding the complex numbers is: why do we define them and their operations the way we do? Everyone knows the old adage about solving quadratic and cubic equations and setting $i^2 = -1$, but this doesn't really capture the why of complex numbers in my opinion. In a previous post, I established the complex numbers as the field $\mathbb{R}[x]/(x^2+1)$. This is a beautiful way to view complex numbers if you happen to know algebra quite well; if you do not, then this is not a very satisfactory characterization. This begs the question: what is the most natural characterization for the complex numbers with the least technical overhead? Recently, I found a characterization that I think captures the essence of complex numbers quite well that should appeal to many people's sensibilities.

In this post, I will go through the characterization and derivation. I have not seen the approach I've taken here as an axiomatic approach to the complex numbers. While it is not novel, it does have a nice simplicity and elegance to it.

Characterizing $\mathbb{C}$

We know from experience that the complex numbers are elements of the form $x+iy$. Alternatively, we can - and often do - view complex numbers as points in the $\mathbb{R}^2$ plane. We associate $x+iy$ with the coordinate pair $(x,y)$. In the coordinate pair representation of complex numbers, complex multiplication is given by $(x_1,y_1)(x_2,y_2) = (x_1x_2 - y_1y_2, x_1y_2 + x_2y_1)$. Our goal is to derive this from reasonable axioms. The coordinate pair representation (i.e. the $\mathbb{R}^2$ representation) will be the starting point.

The coordinate plane $\mathbb{R}^2$ is in fact a vector space over $\mathbb{R}$. It is a standard exercise for first time linear algebra students to prove this. What this means is that $\mathbb{R}^2$ comes equipped with two operations: vector addition and scalar multiplication. These operations must satisfy the usual vector space axioms.

Vector addition is defined as so: $(x_1,y_1) + (x_2,y_2) = (x_1+x_2,y_1+y_2)$. Scalar multiplication is defined by the expression $\alpha(x,y) = (\alpha x,\alpha y)$. These two serve as part of the platform for defining the complex numbers.

Before we start deriving $\mathbb{C}$, we must first consider how different vector spaces and fields are. Every field is a vector space over itself, so we know that there is some non-trivial relationship between the two. However not every vector space can be turned into a field. What separates vector spaces and fields is that fields carry an additional structure: multiplication. You can of course define multiplication on vector spaces (e.g. $(x_1,\ldots, x_n)(y_1,\ldots, y_n) = (x_1y_1,\ldots,x_ny_n)$) but it may not always give a meaningful or useful structure. Every element in a field must have a multiplicative inverse, multiplication must be commutative, and multiplication must distribute over addition. Since we already know that $\mathbb{C}$ is somehow constructed from $\mathbb{R}^2$, what we are tasked with is deriving the multiplication on $\mathbb{C}$, i.e. $\mathbb{C}$ is a fieldification of $\mathbb{R}^2$.

Exercise: Check that the multiplication given in the preceding paragraph does not give a field on $\mathbb{R}^n$ except when $n=1$.

Let us now list some observations/assumptions:

We can identify $\mathbb{R}$ with the $x$ axis in $\mathbb{R}^2$. Meaning that if we multiplied $(x_1,0)$ with $(x_2,0)$, then we should get $(x_1x_2,0)$ since this is how we multiply real numbers.
Since $\mathbb{R}^2$ is a vector space over $\mathbb{R}$, we can think of the scalar multiplication equation $\alpha(x,y) = (\alpha x,\alpha y)$ as being equivalent to $(\alpha, 0)(x,y) = (\alpha x,\alpha y)$.
Since we want a field structure, we need that multiplication is commutative, that is to say that $(x_1,y_1)(x_2,y_2) = (x_2,y_2)(x_1,y_1)$. Moreover, every nonzero element has a multiplicative inverse.
It's natural to want that if $(x_1,y_1)$ is a unit vector (in the Euclidean metric, meaning it lies on the unit circle), then $(x_1,y_1)(x_2,y_2)$ has the same length as $(x_2,y_2)$. This is a generalization of the fact that $|1\cdot x| = |x| = |-1 \cdot x|$, i.e. if you multiply $x$ by a unit vector ($1$ or $-1$ in $\mathbb{R}$), then its length is unchanged.

The first two properties can be thought of as pushing the algebraic structure of $\mathbb{R}$ to $\mathbb{R}^2$; the third is a necessary condition for a field structure; the fourth however is a geometric condition which really captures the essence of the complex numbers. Any field extension of $\mathbb{R}$ (that is a field which contains $\mathbb{R}$) is going to satisfy the first three properties, but there are many fields that do this. It turns out that they're all isomorphic, but these other fields lack the geometry of complex numbers. Our fairly modest fourth condition is what clinches the deal.

Deriving Complex Multiplication

Since we need our multiplication to distribute over addition, it must be of the form

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} \alpha_1 x_1x_2 + \alpha_2 x_1 y_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ \beta_1 x_1x_2 + \beta_2 x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right).$$

The reason for this is that since the map $(x_2,y_2)\mapsto(x_1,y_1)(x_2,y_2)$ distributes over addition and scalar multiplication (by the field axioms), it is actually a linear map. Every linear map must be of the above form. Prove this yourself if you are not convinced. The constants $\alpha_1,\ldots,\alpha_4,\beta_1,\ldots,\beta_4$ must be independent of $(x_1,y_1)$ and $(x_2,y_2)$ in order for multiplication to be well-defined. Thus they are universal to our problem at hand.

Let us consider then what happens when we enforce our first property. If we multiply two "real" numbers (i.e. numbers of the form $(x,0)$), then our multiplication equation becomes

$$\left(\begin{array}{c} x_1 \\ 0 \end{array}\right)\left(\begin{array}{c} x_2 \\ 0\end{array}\right) = \left(\begin{array}{c} \alpha_1 x_1x_2 \\ \beta_1 x_1x_2\end{array}\right)$$

since $y_1 = 0 = y_2$. However we stated that we should get $(x_1x_2,0)$ when we multiply $(x_1,0)$ and $(x_2,0)$. This gives that $\alpha_1 = 1$ and $\beta_1 = 0$ by equating our components.

Let us now consider what happens when we impose our second property. Multiplying $(x_1,0)$ and $(x_2,y_2)$ gives

$$\left(\begin{array}{c} x_1 \\ 0\end{array}\right)\left(\begin{array}{c} x_2 \\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_2 x_1y_2 \\ \beta_2 x_1y_2\end{array}\right).$$

Our second condition stated that this should give us $(x_1 x_2, x_1y_2)$. Equating components forces $\alpha_2 = 0$ and $\beta_2 = 1$. With these two properties, we've significantly simplified our general expression for multiplication to the following:

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right).$$

By imposing our third condition of $(x_1,y_1)(x_2,y_2) = (x_2,y_2)(x_1,y_1)$, we must have that the following is true:

$$\left(\begin{array}{c} x_1x_2 + \alpha_3 y_1x_2 + \alpha_4 y_1y_2 \\ x_1 y_2 +\beta_3 y_1x_2 +\beta_4 y_1 y_2\end{array}\right) = \left(\begin{array}{c} x_1x_2 + \alpha_3 x_1y_2 + \alpha_4 y_1y_2 \\ x_1y_2 + \beta_3 x_1y_2 + \beta_4 y_1y_2\end{array}\right).$$

Again by equating our components, it follows that $\alpha_3 = 0$ and $\beta_3 = 1$ since $x_1y_2$ appears on the left hand side so we are left with the following expression for our multiplication:

$$\left(\begin{array}{c} x_1 \\ y_1\end{array}\right)\left(\begin{array}{c} x_2\\ y_2\end{array} \right) = \left(\begin{array}{c} x_1x_2 + \alpha_4 y_1y_2 \\ x_1y_2 + y_1x_2 + \beta_4 y_1 y_2\end{array}\right).$$

At this point, all that is left to determine are $\alpha_4$ and $\beta_4$. Before narrowing these two down to - hopefully - get $\mathbb{C}$ out of the mix, let us see where we stand. The second part of our third condition is that every nonzero element has a multiplicative inverse. Particularly, this means that $(x_1,y_1)(x_2,y_2) = (0,0)$ if and only if one or both of $(x_1,y_1)$ and $(x_2,y_2)$ are $(0,0)$.

Viewing multiplication by $(x_1,y_1) \neq (0,0)$ as a linear mapping again, the above argument gives that the kernel of the induced linear map is trivial, i.e. its determinant is nonzero. The linear map in question is of the form

$$\left(\begin{array}{cc} x_1 & \alpha_4 y_1 \\ y_1 & x_1 + \beta_4 y_1\end{array}\right).$$

The determinant of this matrix is $x_1^2 + \beta_4 x_1y_1 - \alpha_4 y_1^2$. If the matrix is to be invertible, then the determinant can never be zero, i.e. setting $x_1^2 + \beta_4 x_1y_1 - \alpha_4 y_1^2 = 0$ gives no solution. The only way for this to happen is if the discriminant is negative (which corresponds to non-real solutions though this is putting the cart before the horse). Considering $x_1$ to be the dependent variable, he discriminant in this case is $\beta_4^2 y_1^2 - 4\alpha_4 y_1^2$.

Since $y_1\neq 0$, we have that $\beta_4^2y_1^2 - 4\alpha_4^2y_1^2 < 0$ if and only if $\beta_4^2 -4\alpha_4^2 < 0$. Equivalently, $\beta_4^2 < 4\alpha_4^2$. Picking any such $\alpha_4$ and $\beta_4$ which satisfy this condition would necessarily give rise to a field structure as we desired. However as stated above, these fields do not capture the geometry we would like to associate to the complex numbers. As such, it is not obvious from the first three axioms what we should pick as the definition of the complex numbers. Enter our fourth property.

Our fourth property - while seemingly very modest - is actually a very strong condition. It forces a lot of geometry into our field. Let us now invoke our fourth property in two different ways. Consider $(0,1)$. This is a unit vector so $(0,1)(0,1)$ should have length one. When we multiply these what we get is $(\alpha_4, \beta_4)$. The only way for this vector to have length one is if $\alpha_4^2 + \beta_4^2 = 1$, i.e. $\alpha_4 = \pm \sqrt{1-\beta_4^2}$.

With a concrete relationship between $\alpha_4$ and $\beta_4$, we can now sort out what the values must be. Let us multiply $(0,1)$ and $(\cos\theta,\sin\theta)$. Both of these are unit vectors so the length of the resulting vector must be one as above. Multiplying the vectors, we get $\left(\pm\sqrt{1-\beta_4^2}\sin\theta, \cos\theta + \beta_4\sin\theta\right)$.

The length of this vector is $(1-\beta_4^2)\sin^2\theta + \cos^2\theta + 2\beta_4\sin\theta\cos\theta + \beta_4^2\sin^2\theta$ which simplifies to $1 + \beta_4\sin2\theta$. This can only possibly be $1$ if $\beta_4 = 0$ or $\theta = 0,\frac{\pi}{2},\frac{3\pi}{2}$. Picking any $\theta$ other than those three values, it follows that $\beta_4$ must be zero since $\beta_4$ must be universal. Thus $\alpha_4 = \pm 1$. It is quite straightforward to show that the only choice for $\alpha_4$ which preserves lengths is $\alpha_4 = -1$.

Thus the multiplication on $\mathbb{R}^2$ we posited reduces to

$$(x_1,y_1)(x_2,y_2) = (x_1y_1 - x_2y_2, x_1y_2 + y_1x_2)$$

which is the usual multiplication on $\mathbb{C}$! We also get that $(0,1)(0,1) = (-1,0)$ which is a restatement of $i^2 = -1$. Moreover, in this procedure the matrix representation of complex numbers fell right out quite naturally in our considerations.

The other choices of $\alpha_4$ and $\beta_4$ give rise to "different" fields, though really they are equivalent to $\mathbb{C}$ and the reason is that by virtue of having a negative discriminant, we assured that the polynomial equation that they solve is not solvable over $\mathbb{R}$. Quotienting $\mathbb{R}[x]$ by the polynomial would give a degree two field extension which is isomorphic to $\mathbb{C}$.

Thursday, June 25, 2015

Putting group and ring structures on arbitrary sets

Some months ago, I found myself wondering about abstract sets and how exactly we can put group structures on them. Groups are often presented as known quantities and there are plenty of examples, e.g. $\mathbb{Z}$ and $\mathbb{R}$. When developing mathematical structures it is of interest to know how common they can be, how broad they are and if you can construct them. This means we must ask the question of when a given set and structure they are compatible.

In the case of finite fields, we know that any finite field must have a specific number of elements, particularly there must be a prime power of them, i.e. $p^k$ for some prime $p$. Thus if we were given an arbitrary finite set, there is no guarantee that we can put a field structure on the set. This tells us that field structures are very restrictive. We could ask a few of weaker questions: given an arbitrary set, can we put a group structure on it? Could we put a ring structure on it? Or are these structures somehow too rigid to accommodate an arbitrary set? Surely if we cannot put a group structure on a particular set, we cannot put a ring structure on that same set. Let us first address the first question.

Putting a group structure on an arbitrary set

Finite sets

Let us first consider the finite setting. Suppose $G = \{g_0, \ldots, g_{n-1}\}$ is a finite set. The elements $g_i$ have no algebraic properties a priori since $G$ is an abstract set. If we were to put a group structure on $G$, the most obvious choice is to associate $G$ to $\mathbb{Z}_n$. To do so, simply map $g_i$ to $i$ and inherit the group operation. In more concrete terms, let $\cdot:G\times G\to G$ be defined by $g_i\cdot g_j = g_{(i+j)\pmod n}$.

It's not hard to check that this does indeed define a group. Associativity is taken care of by noting that $(g_i\cdot g_j)\cdot g_k = g_{(i+j+k)\pmod n} = g_i\cdot (g_j\cdot g_k)$ which follows from elementary properties of modular arithmetic. $g_0$ is an identity element since $g_0\cdot g_i = g_{(0+i)\pmod n} = g_i$. Likewise, inverses exist since we have $g_i\cdot g_{n-i} = g_{(i+n-i)\pmod n} = g_0$.

The case of finite sets is pretty simple! Note that it was not claimed in the above construction that the group structure was unique. In specific examples it is unique, but this isn't the topic of this post. If for instance $|G| = 4$ we could put two different group structures on $G$: the group structure of $\mathbb{Z}_4$ and the Klein four-group.

As is the common theme in set theory, we build up in layers to get an intuition for how things should proceed. After covering the case of finite sets, the next most obvious step is to consider countable sets.

Countable sets

Suppose now that $G$ is a countable set. Let $\{\ldots, g_{-2}, g_{-1}, g_0, g_1, g_2, \ldots\}$ be an enumeration of $G$. Such an enumeration exists since $G$ is countable. Again we can define a binary operation on $G$; this time, let $\cdot:G\times G\to G$ be given by $g_i\cdot g_j = g_{i+j}$.

Again this defines a group structure on $G$. Associativity follows from associativity of the addition of integers: $(g_i\cdot g_j)\cdot g_k = g_{i+j+k} = g_i\cdot(g_j\cdot g_k)$. The identity element is very obviously $g_0$ and the inverse of $g_i$ is clearly $g_{-i}$.

An aside about set theory and the axiom of choice

The next step might be to consider uncountable sets but there are not many uncountable groups that we work with in nature. The most common by far are $\mathbb{R}$ and $\mathbb{C}$ (and subgroups thereof). These are uncountable groups but their cardinality is only $2^{|\mathbb{N}|}$. There are however sets that are "more" infinite than the reals and complexes (for instance the set of all subsets of real numbers has strictly greater cardinality).

Thus we cannot appeal directly to a known group to address this issue in full generality. The next best thing is to come up with some set-theoretic approach to address the problem at hand. At this stage, we must state a little about what model of set theory we are using. By far the most widely accepted model is Zermelo-Fraenkel set theory. Zermelo-Fraenkel (ZF) set theory is often not quite strong enough in mathematics and an additional axiom is imposed: the axiom of choice.

The axiom of choice allows for many nice things in mathematics (e.g. every vector space has a basis and the Cartesian product of nonempty sets is again nonempty) but also introduces some somewhat unsavory consequences (e.g. Banach-Tarski paradox and similar results). Many proofs involving the axiom of choice are existence proofs, i.e. a certain object is proved to exist but there is no way of knowing what that object truly looks like. As such, the axiom of choice is a bit of a contentious piece of mathematics, but most mathematicians accept it as a valid axiom despite its pitfalls.

The axiom of choice formally states that if one has an indexed set of nonempty sets $(S_i)_{i\in I}$, then there is an indexed set $(x_i)_{i\in I}$ such that $x_i\in S_i$ for each $i$. Informally, this can be thought of in the following way: given a collection of nonempty sets, one can pick an element from each set at once. This seems intuitively obvious and in the case of a finite number of sets it is indeed obvious; however in the case of an infinite number of sets, it is not guaranteed. The axiom of choice plays a critical role in what follows.

Arbitrary infinite sets

Suppose now that $G$ is an arbitrary infinite set. Since we cannot appeal to groups we're familiar with, all we have at our disposal in terms of operations are the set operations and subsets of $G$. Somehow we must endeavor to construct a group structure using only these ideas.

Since groups only come with one binary operation, we must choose the correct set operation. For instance, unions and intersections are "one way" processes in some sense. More concretely, given an arbitrary set $B$, $A$ cannot be obtained from $A\cup B$ using only unions. As such, inverses wouldn't exist if we chose our group operation to be set union. Similar logic applies to set intersection. Set difference is not a good option either since it is not associative.

Hence we must look to "higher order" set operations, i.e. operations involving multiple set operations. The simplest higher order set operation is that of symmetric difference: $A\oplus B = (A\setminus B)\cup (B\setminus A)$. Visually, symmetric difference is represented by

where the shaded region represents the symmetric difference. Note that $A\oplus\emptyset = A$ for any set $A$. It is not difficult to prove that $\oplus$ is associative. It is also clear that $A\oplus A = \emptyset$. Thus $\oplus$ seems like a good operation with which to define a group structure on an arbitrary set.

Thus far we have not yet used the axiom of choice. It is at this stage that we use the axiom of choice. It is a consequence of the axiom of choice that the set of all finite subsets of an infinite set has the same cardinality of the set itself. The proof for this is not too difficult and relies on the Cantor-Schroeder-Bernstein theorem and some cardinal arithmetic. The axiom of choice is packed into the cardinal arithmetic that is required.

Denote the set of all finite subsets of the set $X$ by $\mathcal{F}(X)$. By the above, there is a bijection between $G$ and $\mathcal{F}(G)$. Let this bijection be denoted $f$. $\mathcal{F}(G)$ can be turned into a group with the operation of symmetric difference. Note that the symmetric difference of two finite sets is again finite. This however is not a group structure on $G$ but $\mathcal{F}(G)$.

To get a group structure on $G$, note that every $g\in G$ can be written as $f^{-1}(S)$ where $S\in\mathcal{F}(G)$. Let $\cdot$ denote the binary operation on $G$. Suppose that $g_1 = f^{-1}(S_1)$ and $g_2 = f^{-1}(S_2)$. Define $g_1\cdot g_2$ to be $f^{-1}(S_1\oplus S_2)$.

We have to show that this does indeed define a group structure on $G$. Consider $(g_1\cdot g_2)\cdot g_3$ This is nothing more than $f^{-1}(S_1\oplus S_2\oplus S_3)$ which is of course $g_1\cdot(g_2\cdot g_3)$. The identity element is the inverse image of the empty set since if $e = f^{-1}(\emptyset)$, then $e\cdot g = f^{-1}(\emptyset\oplus S) = f^{-1}(S) = g$. Inverses also exist: if $g = f^{-1}(S)$, then $g\cdot g = f^{-1}(S\oplus S) = f^{-1}(\emptyset) = e$.

Consequently $(G,\cdot)$ is indeed a group. In fact it is an abelian group since symmetric difference is, well, symmetric: $f^{-1}(S_1)\cdot f^{-1}(S_2) = f^{-1}(S_1\oplus S_2) = f^{-1}(S_2\oplus S_1) = f^{-1}(S_2)\cdot f^{-1}(S_1)$.

In general we were Abel (haha) to put a group structure on an arbitrary set (assuming the axiom of choice). Not only that, we were able to put an abelian group structure on the set. This is important for the ring portion of the secondary question. If we were not able to guarantee that we would have no hope of addressing the question of putting a ring structure on an arbitrary set.

Putting a ring structure on an arbitrary set

Finite sets

When considering the question of being able to put a group structure on an arbitrary set, we had to differentiate the finite and infinite settings. Unsurprisingly, we must do the same for the ring case. If $R$ is a finite set with $|R| = n$, then we could put an abelian group structure on $R$ by associating it to $\mathbb{Z}_n$. As is well-known, $\mathbb{Z}_n$ is already a unital ring! We don't have any further work to do in this case. The infinite case is much more interesting.

Infinite sets

Above we constructed the abelian group for arbitrary infinite sets so we need only to construct the ring structure for it. Let $R$ be an infinite set. We lifted the group structure of $R$ from $\mathcal{F}(R)$ with symmetric difference. It stands to reason that to construct a ring structure on $R$, we'd need to first put a ring structure on $\mathcal{F}(R)$.

To do this, a multiplication operation must be devised. Way back when when first discussing the elementary set operations, I likened them to usual arithmetic operations. Union is a bit like addition and intersection is a bit like multiplication. For this reason, intersection seems to be the logical choice for the second ring operation of multiplication. Before showing that this does indeed work, let's consider union as our notion of multiplication.

If union is to play the role of multiplication, it must "distribute" over symmetric difference (addition), i.e. $A\cup (B\oplus C) = (A\oplus B)\cup (A\oplus C)$. $A,B,C$ are arbitrary so taking $A = B = C$, the left hand side would become $A$ whereas the right hand side is empty. Thus unions do not distribute over symmetric difference in general. As such, union cannot play the role of multiplication. Intersection, however, does exactly what we want which can be easily verified.

Therefore the natural notion of multiplication on $\mathcal{F}(R)$ is set intersection. Like before, we must pull back the multiplication to $R$. Let $r_1 = f^{-1}(S_1)$ and $r_2 = f^{-1}(S_2)$, where $f$ is the bijection between $R$ and $\mathcal{F}(R)$ as before. Let $\star$ denote the multiplication on $R$ and define it by $r_1\star r_2 = f^{-1}(S_1\cap S_2)$.

Note here that there is no unit since the unit would be played by $R$ itself.

It remains to see that this does indeed define a ring structure on $R$. It must then be checked that the ring axioms are obeyed. This is tantamount to checking that multiplication distributes over addition. Consider $r_1 = f^{-1}(S_1)$, $r_2 = f^{-1}(S_2)$ and $r_3 = f^{-1}(S_3)$, then

\begin{eqnarray*}
r_1\star(r_2\cdot r_3) &=& r_1\star f^{-1}(S_2\oplus S_3) \\
&=& f^{-1}(S_1\cap(S_2\oplus S_3)) \\
&=& f^{-1}((S_1\cap S_2)\oplus (S_1\cap S_3)) \\
&=& f^{-1}(S_1\cap S_2)\cdot f^{-1}(S_1\cap S_3) \\
&=& (r_1\star r_2)\cdot (r_1\star r_3)
\end{eqnarray*}

Hence multiplication distributes over addition and moreover $R$ is endowed with a commutative ring structure since multiplication is commutative (which can be tracked back to the commutativity of set intersection).

We've set out to do exactly what we wanted! We've successfully shown that every set can be endowed with a ring (and a fortiori a group) structure. However one seeming issue is that the ring that we naturally developed in the infinite setting was non-unital. This has its roots in the fact that the unit with regards to set intersection is the universe set. Since our universe set was assumed to be infinite, it is not included in the set of all finite subsets.

It stands to reason then that we should modify our initial construction to include the whole set $R$ to begin with. However if we do this, we immediately lose closure: if $S$ is a finite subset of $R$, then $S\oplus R = R\setminus S$ which is most definitely not a finite set. We could remedy this by considering a union of the sets of finite subsets and cofinite subsets of $R$. This set has the same cardinality as $R$ by cardinal arithmetic. Hence there is a bijection between these sets and the same analysis follows through quite naturally. However this approach gives rise to a unital ring structure on $R$ since now $R$ is included.

The next most logical question to ask is if division rings can be formed on arbitrary sets. This however is beyond my expertise. Based on the construction above, it seems unlikely as intersection is (as stated above) a one-way process in some sense.

Saturday, March 14, 2015

An Algebraic Construction of the Split-Complex Numbers

In a previous post, I constructed the split-complex numbers in a somewhat handwavy fashion. It went as so: suppose $j$ is a solution to $x^2=1$ but $j\neq\pm 1$, then the split-complex numbers are defined by $a+jb$. In the last post, we examined a new characterization for the complex numbers via quotients. Particularly, we wanted to find a solution to $x^2+1$. However no such solution exists in $\mathbb{R}$, so by quotienting $\mathbb{R}[x]$ by $x^2+1$, we were able to get solutions.

We know that for a polynomial $p(x)$ over a field $\mathbb{F}$, $\mathbb{F}[x]/(p(x))$ is a field if and only if $p(x)$ is irreducible over $\mathbb{F}$. Let us consider the case of $p(x) = x^2-1$. We know that this polynomial has solutions in $\mathbb{R}$: $\pm 1$. We, however, can still quotient $\mathbb{R}[x]$ by $x^2-1$. Let us see what happens if we do so. Repeating similar logic as in the case of $x^2+1$, we have that

$$\mathbb{R}[x]/(x^2-1) = \{[a+bx]:a,b\in\mathbb{R}\}$$

where $[a+bx] = \{q(x):p(x)\mid (q(x)-a-bx)\}$. Again, we have an algebra structure on $\mathbb{R}[x]/(x^2-1)$ since we can multiply polynomials. Multiplying two elements of the $\mathbb{R}[x]/(x^2-1)$, we get

$$(a + bx + (p(x)))(c + dx + (p(x)))= ac + ad x + bc x + bd x^2 + (p(x)).$$

However since $x^2 \equiv 1 \pmod{x^2+1}$, it follos that $bd x^2 \equiv bd$ and so

$$(a + bx + (p(x)))(c + dx + (p(x))) = (ac + bd) + (ad + bc) x + (p(x)).$$

If you recall the structure of the split-complex numbers, $(a+jb)(c+jd) = (ac+bd)+(ad+bc)j$. Thus there is a natural isomorphism between $\mathbb{R}[x]/(x^2-1)$ and $\mathbb{H}$ as they have the same additive and multiplicative structures. Before, it was noted that not every element in $\mathbb{H}$ has an inverse; particularly, those elements living on the lines $y = \pm x$.

The ring $\mathbb{R}[x]/(x^2-1)$ is commutative but as we noted, it cannot be a field since $x^2-1$ is reducible over $\mathbb{R}$. This is in exact agreement with what we showed before. Thus the general algebraic theory gave to us quite naturally the fact that $\mathbb{H}$ is not a field. Moreover, we have a matrix representation for the split-complex numbers by the same logic as in the previous post:

$$a+jb \Longleftrightarrow \left(\begin{array}{cc} a & b \\ b & a\end{array}\right).$$

The matix corresponding to complex numbers is

$$a+ib \Longleftrightarrow \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right)$$

which is invertible if $a,b\neq 0$ since its determinant is nonzero. The former matrix is not invertible since whenever $a^2 = b^2$, the determinant is zero. This again agrees with the previous analysis.

Tuesday, March 10, 2015

Fields, Polynomials and Matrices

On a homework assignment, we were asked, given a field $\mathbb{F}$ (not necessarily algebraically complete) and a polynomial $p(x)\in\mathbb{F}[x]$ of degree $n$ to find a matrix $A\in\mathbb{F}_{n\times n}$ such that $p(A) = 0$. Before going into this problem in detail, I wish to address a specific example to see how this might come about.

The Curious Case of $x^2+1$ and the Complex Numbers

A common result in algebra is that the complex numbers can be represented by $2\times 2$ real matrices, particularly

$$a+ib \Longleftrightarrow \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right).$$

In what sense do we mean by "represented" here? In short, the algebraic operations are preserved exactly. As for what this means depends on the context. For our purposes, we are interested in fields primarily so I will explain it in this setting.

Definition Let $\mathbb{F}_1$ and $\mathbb{F}_2$ be fields. A field homomorphism is a map $\varphi:\mathbb{F}_1\to\mathbb{F}_2$ such that the following are true:

$\varphi(0) = 0$
$\varphi(1) = 1$
$\varphi(a+b) = \varphi(a)+\varphi(b)$
$\varphi(ab) = \varphi(a)\varphi(b)$

The complex numbers with addition defined by the relation $(a+ib)+(c+id) = (a+c)+i(b+d)$ and multiplication defined by the relation $(a+ib)(c+id) = (ac-bd)+i(ad+bc)$ make $\mathbb{C}$ into a field. If we consider the set of matrices

$$\widetilde{\mathbb{C}} = \left\{\left(\begin{array}{rr} a & -b \\ b & a\end{array}\right): a,b\in \mathbb{R} \right\}$$

then $\widetilde{\mathbb{C}}$ can be turned into a field with the usual matrix addition and multiplication as well as the identification that $0$ is the zero matrix and $1$ is the identity matrix. We claim that the map $a+ib \to \left(\begin{array}{rr} a & -b \\ b & a \end{array} \right)$ defines a field homomorphism. Let this map be denoted by $\varphi$. Then clearly $\varphi(0) = 0$ and $\varphi(1) = 1$. Additionally,

\begin{eqnarray*} \varphi((a+ib)+(c+id)) &=& \varphi((a+c)+i(b+d)) \\ &=& \left(\begin{array}{rr} a+c & -b-d \\ b+d & a+c\end{array}\right) \\ &=& \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right) + \left(\begin{array}{rr} c & -d \\ d & c\end{array}\right) \\ &=& \varphi(a+ib) + \varphi(c+id). \end{eqnarray*}

and likewise

\begin{eqnarray*} \varphi((a+ib)(c+id)) &=& \varphi((ac-bd)+i(ad+bc)) \\ &=& \left(\begin{array}{rr} ac-bd & -ad-bc \\ ad+bc & ac-bd\end{array}\right) \\ &=& \left(\begin{array}{rr} a & -b \\ b & a \end{array}\right)\left(\begin{array}{rr} c & -d \\ d & c\end{array}\right) \\ &=& \varphi(a+ib)\varphi(c+id). \end{eqnarray*}

Thus, $\varphi$ is a field homomorphism as desired. This means that we can indeed view complex numbers as being $2\times 2$ matrices of the above form since we lose no information when doing so. In fact, the two are effectively equivalent; they are just different representations of the same algebraic structure.

The matrices $\left(\begin{array}{rr} a & -b \\ b & a\end{array}\right)$ are of a suggestive form. If we let $a = r\cos\theta$ and $b = r\sin\theta$ as per the polar form of complex numbers, the corresponding matrix becomes $r\left(\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\right)$. Note that this is just $r$ multiplying the rotation matrix. In fact, the geometric interpretation of complex numbers is often the way in which the matrix representation of complex numbers is established.

Polynomial Rings, Zeroes and Fields

A trend in modern mathematics is to find abstract and more algebraic ways in which to view common facts. The reason being that the ideas contained in the abstract and algebraic approaches, without appealing to graphical arguments or intuition, can give rise to new ideas. How does this all relate to the idea posed in the outset of this post? We know that $i^2 + 1 = 0$ by definition. From the matrix representation of complex numbers, we know that $i$ corresponds to the matrix $\left(\begin{array}{rr} 0 & -1 \\ 1 & 0\end{array}\right)$. Squaring this matrix, we get $\left(\begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array}\right)$. Thus, the matrix corresponding to $i$ also solves $x^2 + 1 = 0$.

The obvious question to ask is how to derive the representation $\left(\begin{array}{rr} 0 & -1 \\ 1 & 0\end{array}\right)$ without appealing to geometrical arguments and solely algebraic arguments. To answer this question in meaningful detail, we need to consider a different, equivalent, definition for the complex numbers. To do so, we need some results regarding polynomial rings and what are known as irreducible representations. First some definitions.

Definition Let $\mathbb{F}$ be a field. The polynomial ring over $\mathbb{F}$, denoted $\mathbb{F}[x]$, is given by

$$\mathbb{F}[x] = \{a_0+a_1x+\cdots+a_nx^n: a_0,\ldots,a_n\in\mathbb{F}, n\in\mathbb{N}\},$$

where the $a_0+a_1x+\cdots+a_nx^n$ are understood to be formal sums. A rigorous way to view polynomials is as elements of the vector space $\mathbb{F}^{\mathbb{N}}$ such that all but finitely many coefficients are nonzero (with the identification $1 = e_0$, $x = e_1$, etc.).

One of the properties of polynomials we are most interested in is their roots. Given an arbitrary field and polynomial, there is no guarantee that there is a root in the field; e.g. consider $\mathbb{R}$ and $p(x) = x^2+1$. We know that there is no solution to $x^2 + 1 = 0$ in the reals. This motivates the next definition.

Definition A polynomial $p(x)\in\mathbb{F}[x]$ is said to be irreducible if there is no root of $p(x)$ in $\mathbb{F}$, i.e. $p(x)\neq 0$ for any $x$.

We'd like to be able to "force" roots to exist for irreducible polynomials. In the case of $p(x) = x^2+1$ and $\mathbb{F}=\mathbb{R}$, we had to add $i$ in order to get a solution. This isn't very satisfactory in general since it is very dependent upon the structure of $\mathbb{R}$. We need a way to force roots in general. A common theme in mathematics to force subobjects to be zero is by quotienting by them.

Armed with this notion, we might be tempted to consider $\mathbb{F}[x]/(p(x))$ if $p(x)$ is irreducible in order to get a zero of $p(x)$. (Here $(p(x)) = p(x)\mathbb{F}[x]$, i.e. $(p(x))$ is the set of all multiples of $p(x)$.) In $\mathbb{F}[x]/(p(x))$, $p(x) \equiv 0$ so this seems reasonable and since $\mathbb{F}[x]/(p(x))$ is a field, roots may exist.

Moreover, we would think that, since we have set $p(x)$ to zero by quotienting, $x$ should be the root of $p(x)$ in $\mathbb{F}[x]/(p(x))$. $x$ is not an element of $\mathbb{F}[x]/(p(x))$; however $x+(p(x))$ is an element of $\mathbb{F}[x]/(p(x))$, though. Let us check that this does indeed work. Writing $p(x) = a_0 + a_1 x + \cdots + a_n x^n$, we have

\begin{eqnarray*} p(x+(p(x))) &=& a_0 + a_1 (x+p(x)) + \cdots + a_n (x+(p(x)))^n \\ &=& a_0 + a_1 x + \cdots + a_n x^n + (p(x)). \end{eqnarray*}

Note that we get $p(x) + (p(x))$, but since $p(x)\in(p(x))$, this is nothing more than $(p(x))$ - which we have set to zero in $\mathbb{F}[x]/(p(x))$. Hence $x+(p(x))$ is a root of $p(x)$ in $\mathbb{F}[x]/(p(x))$.

The Algebraic Structure of $\mathbb{F}[x]/(p(x))$

Even though $\mathbb{F}[x]/(p(x))$ is a field, it is also a vector space over $\mathbb{F}$. This is easy to check. Moreover, it is $n$ dimensional since the vectors $1+(p(x)),x+(p(x)),\ldots, x^{n-1}+(p(x))$ are linearly independent - there is no linear combination of $1+(p(x)),x+(p(x)),\ldots,x^{n-1}+(p(x))$ that can give $(p(x))$ (since $p(x)$ is degree $n$ and the others all have lower degree).

$\mathbb{F}[x]/(p(x))$ not only carries a vector space structure but it inherits even more structure from $\mathbb{F}[x]$ since we can multiply polynomials. This leads into the next definition.

Definition Let $V$ be a vector space over a field $\mathbb{F}$. We say that $V$ is an algebra if $V$ carries with its vector space structure a product $\cdot:V\times V\to V$ satisfying

$(x+y)\cdot z = x\cdot z + y\cdot z$
$x\cdot (y+z) = x\cdot y + x\cdot z$
$(ax)\cdot(by) = (ab)(x\cdot y)$

for all $x,y,z\in V$ and $a,b\in\mathbb{F}$.

$\mathbb{F}[x]/(p(x))$ is an algebra over $\mathbb{F}$ with $\cdot$ denoting coset multiplication. With this, we can finally address the question of $p(x) = x^2+1$ over $\mathbb{R}$. Let us consider what happens when we act the basis elements on each other. Denote action by $\rhd$ and dropping the explicit mention of $(p(x))$ for now, we have $1\rhd x^k = x^k$. Additionally, we also have that $x\rhd x^k = x^{k+1}$ for $0 \le k \le n-2$ and $x\rhd x^{n-1} = x^n$. Since we are working modulo $(p(x))$, $x^n \equiv -a_0 - a_1 x - \cdots - a_{n-1}x^{n-1}$. We can write some matrix representations for $1+(p(x))$ and $x+(p(x))$:

$$[1+(p(x))] = \left(\begin{array}{ccc} 1 & & \\ & \ddots & \\ & & 1\end{array}\right)$$

and

$$[x+(p(x))] = \left(\begin{array}{cccc} & & & -a_0 \\ 1 & & & -a_1 \\ & \ddots & & \vdots \\ & & 1 & -a_{n-1}\end{array}\right). \tag{1}$$

Here, the empty entries are zero. The others are quite similar. The algebra defined by the matrices is equivalent to the algebra defined by $\{1+(p(x)),x+(p(x)),\ldots,x^{n-1}+(p(x))\}$. The natural definition of $p$ on the matrix algebra is

\begin{eqnarray*} p\left(\left[x^k + (p(x))\right]\right) &=&\left [p\left(x^k + (p(x))\right )\right]. \end{eqnarray*}

As such, it is manifest that $p([x+(p(x))]) = 0$. Thus, by defining $A = [x+(p(x))]$, we see that $p(A) = 0$ which answers the original question posed at the outset. If we knew that $\mathbb{F}$ was algebraically closed from the outset (or more generally just that $p(x)$ is reducible), then we could immediately find such an $A$.

If $p(x)$ is reducible over $\mathbb{F}$, then it has some root in $\mathbb{F}$. Denote this root by $c$. Then let $A = \operatorname{diag}(c,\ldots,c)$. Direct computation shows that $p(A) = 0$. With all of this theory, it is instructive to return to our toy example of $p(x) = x^2+1$.

An Alternate Characterization of $\mathbb{C}$

We already know that $p(x) = x^2+1$ is irreducible over $\mathbb{R}$ and that $\mathbb{R}[x]/(x^2+1)$ is a field as a result. Let us see what field this is, exactly. Since we're setting $x^2+1 = 0$ by quotienting, we should expect that $\mathbb{R}[x]/(x^2+1)$ is $\mathbb{C}$. Since $x^2 + 1$ is of degree $2$, every polynomial in $\mathbb{R}[x]$ is equivalent to $a+bx$ (modulo $x^2+1$).

Define $\varphi:\mathbb{R}[x]/(x^2+1)\to\mathbb{C}$ by $\varphi(a+bx+(p(x))) = a+ib$. Clearly this map is surjective, $\varphi(0+(p(x))) = 0$ and $\varphi(1+(p(x))) = 1$. Moreover,

\begin{eqnarray*}\varphi((a+bx+(p(x))) + (c+dx+(p(x)))) &=& \varphi((a+c) + (b+d)x + (p(x))) \\ &=& (a+c)+(b+d)i \\ &=& (a+ bi) + (c+di). \end{eqnarray*}

This last expression is clearly $\varphi(a+bx+(p(x))) + \varphi(c+dx+(p(x)))$. A similar analysis shows that

$$\varphi((a+bx+(p(x)))(c+dx+(p(x)))) = \varphi(a+bx+(p(x)))\varphi(c+dx+(p(x))).$$

This last equality is predicated on the equality $bdx^2 \equiv -bd\pmod{(x^2+1)}$. Since $\varphi$ is injective (since it is a homomorphism of fields) and is surjective, it follows that it is an isomorphism of fields. Hence we can view $\mathbb{C}$ as $\mathbb{R}[x]/(x^2+1)$.

Using our general expression for a matrix solving $p(A) = 0$ (equation $(1)$) (with $p(x) = x^2+1$ here), we have that a matrix solving $A^2 + 1 = 0$ is

$$ A = \left(\begin{array}{rr} 0 & -1 \\ 1 & 0\end{array}\right).$$

Since $x$ corresponds to $i$, we see that $i$ is represented by $A$ as we noted above. However there was no geometric argument here at all - only algebraic arguments! Since any complex number is of the form $a+ib$, we see that

$$a+ib \Longleftrightarrow \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right).$$

This surprising detour through some seemingly unrelated mathematics naturally gave us the matrix representation for complex numbers that we've been seen many times before.