Mathematics Abound: Fields, Polynomials and Matrices

On a homework assignment, we were asked, given a field $\mathbb{F}$ (not necessarily algebraically complete) and a polynomial $p(x)\in\mathbb{F}[x]$ of degree $n$ to find a matrix $A\in\mathbb{F}_{n\times n}$ such that $p(A) = 0$. Before going into this problem in detail, I wish to address a specific example to see how this might come about.

The Curious Case of $x^2+1$ and the Complex Numbers

A common result in algebra is that the complex numbers can be represented by $2\times 2$ real matrices, particularly

$$a+ib \Longleftrightarrow \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right).$$

In what sense do we mean by "represented" here? In short, the algebraic operations are preserved exactly. As for what this means depends on the context. For our purposes, we are interested in fields primarily so I will explain it in this setting.

Definition Let $\mathbb{F}_1$ and $\mathbb{F}_2$ be fields. A field homomorphism is a map $\varphi:\mathbb{F}_1\to\mathbb{F}_2$ such that the following are true:

$\varphi(0) = 0$
$\varphi(1) = 1$
$\varphi(a+b) = \varphi(a)+\varphi(b)$
$\varphi(ab) = \varphi(a)\varphi(b)$

The complex numbers with addition defined by the relation $(a+ib)+(c+id) = (a+c)+i(b+d)$ and multiplication defined by the relation $(a+ib)(c+id) = (ac-bd)+i(ad+bc)$ make $\mathbb{C}$ into a field. If we consider the set of matrices

$$\widetilde{\mathbb{C}} = \left\{\left(\begin{array}{rr} a & -b \\ b & a\end{array}\right): a,b\in \mathbb{R} \right\}$$

then $\widetilde{\mathbb{C}}$ can be turned into a field with the usual matrix addition and multiplication as well as the identification that $0$ is the zero matrix and $1$ is the identity matrix. We claim that the map $a+ib \to \left(\begin{array}{rr} a & -b \\ b & a \end{array} \right)$ defines a field homomorphism. Let this map be denoted by $\varphi$. Then clearly $\varphi(0) = 0$ and $\varphi(1) = 1$. Additionally,

\begin{eqnarray*} \varphi((a+ib)+(c+id)) &=& \varphi((a+c)+i(b+d)) \\ &=& \left(\begin{array}{rr} a+c & -b-d \\ b+d & a+c\end{array}\right) \\ &=& \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right) + \left(\begin{array}{rr} c & -d \\ d & c\end{array}\right) \\ &=& \varphi(a+ib) + \varphi(c+id). \end{eqnarray*}

and likewise

\begin{eqnarray*} \varphi((a+ib)(c+id)) &=& \varphi((ac-bd)+i(ad+bc)) \\ &=& \left(\begin{array}{rr} ac-bd & -ad-bc \\ ad+bc & ac-bd\end{array}\right) \\ &=& \left(\begin{array}{rr} a & -b \\ b & a \end{array}\right)\left(\begin{array}{rr} c & -d \\ d & c\end{array}\right) \\ &=& \varphi(a+ib)\varphi(c+id). \end{eqnarray*}

Thus, $\varphi$ is a field homomorphism as desired. This means that we can indeed view complex numbers as being $2\times 2$ matrices of the above form since we lose no information when doing so. In fact, the two are effectively equivalent; they are just different representations of the same algebraic structure.

The matrices $\left(\begin{array}{rr} a & -b \\ b & a\end{array}\right)$ are of a suggestive form. If we let $a = r\cos\theta$ and $b = r\sin\theta$ as per the polar form of complex numbers, the corresponding matrix becomes $r\left(\begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\right)$. Note that this is just $r$ multiplying the rotation matrix. In fact, the geometric interpretation of complex numbers is often the way in which the matrix representation of complex numbers is established.

Polynomial Rings, Zeroes and Fields

A trend in modern mathematics is to find abstract and more algebraic ways in which to view common facts. The reason being that the ideas contained in the abstract and algebraic approaches, without appealing to graphical arguments or intuition, can give rise to new ideas. How does this all relate to the idea posed in the outset of this post? We know that $i^2 + 1 = 0$ by definition. From the matrix representation of complex numbers, we know that $i$ corresponds to the matrix $\left(\begin{array}{rr} 0 & -1 \\ 1 & 0\end{array}\right)$. Squaring this matrix, we get $\left(\begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array}\right)$. Thus, the matrix corresponding to $i$ also solves $x^2 + 1 = 0$.

The obvious question to ask is how to derive the representation $\left(\begin{array}{rr} 0 & -1 \\ 1 & 0\end{array}\right)$ without appealing to geometrical arguments and solely algebraic arguments. To answer this question in meaningful detail, we need to consider a different, equivalent, definition for the complex numbers. To do so, we need some results regarding polynomial rings and what are known as irreducible representations. First some definitions.

Definition Let $\mathbb{F}$ be a field. The polynomial ring over $\mathbb{F}$, denoted $\mathbb{F}[x]$, is given by

$$\mathbb{F}[x] = \{a_0+a_1x+\cdots+a_nx^n: a_0,\ldots,a_n\in\mathbb{F}, n\in\mathbb{N}\},$$

where the $a_0+a_1x+\cdots+a_nx^n$ are understood to be formal sums. A rigorous way to view polynomials is as elements of the vector space $\mathbb{F}^{\mathbb{N}}$ such that all but finitely many coefficients are nonzero (with the identification $1 = e_0$, $x = e_1$, etc.).

One of the properties of polynomials we are most interested in is their roots. Given an arbitrary field and polynomial, there is no guarantee that there is a root in the field; e.g. consider $\mathbb{R}$ and $p(x) = x^2+1$. We know that there is no solution to $x^2 + 1 = 0$ in the reals. This motivates the next definition.

Definition A polynomial $p(x)\in\mathbb{F}[x]$ is said to be irreducible if there is no root of $p(x)$ in $\mathbb{F}$, i.e. $p(x)\neq 0$ for any $x$.

We'd like to be able to "force" roots to exist for irreducible polynomials. In the case of $p(x) = x^2+1$ and $\mathbb{F}=\mathbb{R}$, we had to add $i$ in order to get a solution. This isn't very satisfactory in general since it is very dependent upon the structure of $\mathbb{R}$. We need a way to force roots in general. A common theme in mathematics to force subobjects to be zero is by quotienting by them.

Armed with this notion, we might be tempted to consider $\mathbb{F}[x]/(p(x))$ if $p(x)$ is irreducible in order to get a zero of $p(x)$. (Here $(p(x)) = p(x)\mathbb{F}[x]$, i.e. $(p(x))$ is the set of all multiples of $p(x)$.) In $\mathbb{F}[x]/(p(x))$, $p(x) \equiv 0$ so this seems reasonable and since $\mathbb{F}[x]/(p(x))$ is a field, roots may exist.

Moreover, we would think that, since we have set $p(x)$ to zero by quotienting, $x$ should be the root of $p(x)$ in $\mathbb{F}[x]/(p(x))$. $x$ is not an element of $\mathbb{F}[x]/(p(x))$; however $x+(p(x))$ is an element of $\mathbb{F}[x]/(p(x))$, though. Let us check that this does indeed work. Writing $p(x) = a_0 + a_1 x + \cdots + a_n x^n$, we have

\begin{eqnarray*} p(x+(p(x))) &=& a_0 + a_1 (x+p(x)) + \cdots + a_n (x+(p(x)))^n \\ &=& a_0 + a_1 x + \cdots + a_n x^n + (p(x)). \end{eqnarray*}

Note that we get $p(x) + (p(x))$, but since $p(x)\in(p(x))$, this is nothing more than $(p(x))$ - which we have set to zero in $\mathbb{F}[x]/(p(x))$. Hence $x+(p(x))$ is a root of $p(x)$ in $\mathbb{F}[x]/(p(x))$.

The Algebraic Structure of $\mathbb{F}[x]/(p(x))$

Even though $\mathbb{F}[x]/(p(x))$ is a field, it is also a vector space over $\mathbb{F}$. This is easy to check. Moreover, it is $n$ dimensional since the vectors $1+(p(x)),x+(p(x)),\ldots, x^{n-1}+(p(x))$ are linearly independent - there is no linear combination of $1+(p(x)),x+(p(x)),\ldots,x^{n-1}+(p(x))$ that can give $(p(x))$ (since $p(x)$ is degree $n$ and the others all have lower degree).

$\mathbb{F}[x]/(p(x))$ not only carries a vector space structure but it inherits even more structure from $\mathbb{F}[x]$ since we can multiply polynomials. This leads into the next definition.

Definition Let $V$ be a vector space over a field $\mathbb{F}$. We say that $V$ is an algebra if $V$ carries with its vector space structure a product $\cdot:V\times V\to V$ satisfying

$(x+y)\cdot z = x\cdot z + y\cdot z$
$x\cdot (y+z) = x\cdot y + x\cdot z$
$(ax)\cdot(by) = (ab)(x\cdot y)$

for all $x,y,z\in V$ and $a,b\in\mathbb{F}$.

$\mathbb{F}[x]/(p(x))$ is an algebra over $\mathbb{F}$ with $\cdot$ denoting coset multiplication. With this, we can finally address the question of $p(x) = x^2+1$ over $\mathbb{R}$. Let us consider what happens when we act the basis elements on each other. Denote action by $\rhd$ and dropping the explicit mention of $(p(x))$ for now, we have $1\rhd x^k = x^k$. Additionally, we also have that $x\rhd x^k = x^{k+1}$ for $0 \le k \le n-2$ and $x\rhd x^{n-1} = x^n$. Since we are working modulo $(p(x))$, $x^n \equiv -a_0 - a_1 x - \cdots - a_{n-1}x^{n-1}$. We can write some matrix representations for $1+(p(x))$ and $x+(p(x))$:

$$[1+(p(x))] = \left(\begin{array}{ccc} 1 & & \\ & \ddots & \\ & & 1\end{array}\right)$$

and

$$[x+(p(x))] = \left(\begin{array}{cccc} & & & -a_0 \\ 1 & & & -a_1 \\ & \ddots & & \vdots \\ & & 1 & -a_{n-1}\end{array}\right). \tag{1}$$

Here, the empty entries are zero. The others are quite similar. The algebra defined by the matrices is equivalent to the algebra defined by $\{1+(p(x)),x+(p(x)),\ldots,x^{n-1}+(p(x))\}$. The natural definition of $p$ on the matrix algebra is

\begin{eqnarray*} p\left(\left[x^k + (p(x))\right]\right) &=&\left [p\left(x^k + (p(x))\right )\right]. \end{eqnarray*}

As such, it is manifest that $p([x+(p(x))]) = 0$. Thus, by defining $A = [x+(p(x))]$, we see that $p(A) = 0$ which answers the original question posed at the outset. If we knew that $\mathbb{F}$ was algebraically closed from the outset (or more generally just that $p(x)$ is reducible), then we could immediately find such an $A$.

If $p(x)$ is reducible over $\mathbb{F}$, then it has some root in $\mathbb{F}$. Denote this root by $c$. Then let $A = \operatorname{diag}(c,\ldots,c)$. Direct computation shows that $p(A) = 0$. With all of this theory, it is instructive to return to our toy example of $p(x) = x^2+1$.

An Alternate Characterization of $\mathbb{C}$

We already know that $p(x) = x^2+1$ is irreducible over $\mathbb{R}$ and that $\mathbb{R}[x]/(x^2+1)$ is a field as a result. Let us see what field this is, exactly. Since we're setting $x^2+1 = 0$ by quotienting, we should expect that $\mathbb{R}[x]/(x^2+1)$ is $\mathbb{C}$. Since $x^2 + 1$ is of degree $2$, every polynomial in $\mathbb{R}[x]$ is equivalent to $a+bx$ (modulo $x^2+1$).

Define $\varphi:\mathbb{R}[x]/(x^2+1)\to\mathbb{C}$ by $\varphi(a+bx+(p(x))) = a+ib$. Clearly this map is surjective, $\varphi(0+(p(x))) = 0$ and $\varphi(1+(p(x))) = 1$. Moreover,

\begin{eqnarray*}\varphi((a+bx+(p(x))) + (c+dx+(p(x)))) &=& \varphi((a+c) + (b+d)x + (p(x))) \\ &=& (a+c)+(b+d)i \\ &=& (a+ bi) + (c+di). \end{eqnarray*}

This last expression is clearly $\varphi(a+bx+(p(x))) + \varphi(c+dx+(p(x)))$. A similar analysis shows that

$$\varphi((a+bx+(p(x)))(c+dx+(p(x)))) = \varphi(a+bx+(p(x)))\varphi(c+dx+(p(x))).$$

This last equality is predicated on the equality $bdx^2 \equiv -bd\pmod{(x^2+1)}$. Since $\varphi$ is injective (since it is a homomorphism of fields) and is surjective, it follows that it is an isomorphism of fields. Hence we can view $\mathbb{C}$ as $\mathbb{R}[x]/(x^2+1)$.

Using our general expression for a matrix solving $p(A) = 0$ (equation $(1)$) (with $p(x) = x^2+1$ here), we have that a matrix solving $A^2 + 1 = 0$ is

$$ A = \left(\begin{array}{rr} 0 & -1 \\ 1 & 0\end{array}\right).$$

Since $x$ corresponds to $i$, we see that $i$ is represented by $A$ as we noted above. However there was no geometric argument here at all - only algebraic arguments! Since any complex number is of the form $a+ib$, we see that

$$a+ib \Longleftrightarrow \left(\begin{array}{rr} a & -b \\ b & a\end{array}\right).$$

This surprising detour through some seemingly unrelated mathematics naturally gave us the matrix representation for complex numbers that we've been seen many times before.

Mathematics Abound

Tuesday, March 10, 2015

Fields, Polynomials and Matrices

The Curious Case of $x^2+1$ and the Complex Numbers

Polynomial Rings, Zeroes and Fields

The Algebraic Structure of $\mathbb{F}[x]/(p(x))$

An Alternate Characterization of $\mathbb{C}$

No comments:

Post a Comment