Saturday, October 27, 2012

The Split-Complex Numbers and Special Relativity

Recall from my post about the quaternions that I had mentioned the split-complex (hyperbolic) numbers. I now wish to return to these and uncover the nature of calculus on the split-complex plane and determine what it says about the nature of physics. The split-complex numbers are inherently linked to special relativity and hence are linked to our understanding of the world. Therefore a foray into the world of split-complex numbers will lead to fundamental insight into how nature works. For the purposes of this post, I will consider $1+1$ dimensional spacetime (one spatial variable and one temporal variable).

The Hyperbolic Numbers

I will denote the set of split-complex numbers by $\Bbb H$ (for hyperbolic numbers). Recall that the split-complex numbers arise when one defines a quantity $j$ such that $j^2 = 1$ but $j$ is neither $1$ or $-1$ and taking numbers of the form $a+bj$ gives a split-complex number (in this post I assume $a$ and $b$ are real). I also called such numbers hyperbolic numbers but did not give any explanation for why they have such a moniker. To understand this description one must consider the modulus of a split-complex number $z = a+bj$. In the case of the complex numbers, the modulus-square gives the equation of a circle and so the modulus-square suggests some sort of structure on the complex numbers. The modulus-square also takes two complex numbers and outputs a real number, i.e. $|\cdot|:\Bbb C\to\Bbb R$ (recall that $|z|^2 = zz^*$).

By analogy, we want some sort of product that takes $\Bbb H\times \Bbb H$ into $\Bbb R$, and it turns out that the conjugation prescription is the way to do this. If we fix $z=x+jy$ its hyperbolic conjugate is $z^* = x-jy$ and their product is given by $zz^* = (x+jy)(x-jy) = x^2-y^2$. So we see that multiplying a split-complex number by its conjugate gives a real number like desired with one caveat: this number is no longer strictly positive. The form of the modulus-square is very reminiscent of the equation of a hyperbola, $\dfrac{x^2}{a^2}-\dfrac{y^2}{b^2} = 1$, and this is why the split-complex numbers are often referred to as hyperbolic numbers. (Some refer to $zz^*$ as the modulus but I think it causes confusion when compared with the complex case and elect to call it the modulus-square.) One peculiar aspect of the hyperbolic numbers is that if one considers a hyperbolic number of the form $x+jx$, it has a modulus of $0$ which, as we will see, is a bit troublesome but very important.

Now that we have come up with the modulus, it is natural to ask if we can divide by hyperbolic numbers since this was the progression we took when considering the complex numbers. If we want to find an inverse for a hyperbolic number $z$, we want to determine what $z'$ in $\Bbb H$ satisfies $zz' = 1$. Since multiplication of hyperbolic numbers is commutative (meaning the order in which we multiply doesn't matter), if we find out what $z'$ satisfies $zz' = 1$, then we know that it also satisfies $z'z = 1$ and so it is the unique inverse.

So let us multiply both sides of the equality $zz' = 1$ by $z^*$ to get that $z^*zz' = z^*$, therefore $z' = \dfrac{z^*}{z^*z}$. If $z=x+jy$, then we have that $z' = \dfrac{x-jy}{x^2-y^2}$ and we have an explicit expression for the inverse of a hyperbolic number. As noted above if $z=x+jx$, then its modulus is $0$ and so the inverse of $z$ cannot hope to make sense with the above expression since we would end up dividing by $0$. One can actually show that no $z'\in\Bbb H$ exists so that $zz' = 1$ if $z=x+jx$. Geometrically one can say that the hyperbolic numbers along the lines $y=\pm\, x$ in the hyperbolic plane do not have inverses. These two lines are very important in special relativity and are known as the light cone. I will explore this further later in the post.

We have now built up hyperbolic numbers, their modulus(-square) and their inverses. It is then reasonable to ask if they have a polar-like representation like complex numbers (each complex number $z$ except $0$ can be written in the form $r\exp(i\theta)$ where $r$ is the modulus and $\theta\in [0,2\pi)$). We already have the modulus for hyperbolic numbers so we need to explore what, if anything, $\exp(j\theta)$ is. I will make the assumption that hyperbolic series converge nicely and the following proof is not meant to be rigorous but merely instructive. In the "proof" we will need the fact that $j^n = 1$ if $n$ is even and $j^n = j$ if $n$ is odd.

$$\exp(j\theta) = \sum\limits_{n=0}^{\infty} \frac{(j\theta)^n}{n!} = \sum\limits_{n=0}^{\infty} \frac{j^n\theta^n}{n!}.$$

At this point it would be prudent to separate our above series into two separate series corresponding to the case of $j^{\text{even}}$ and $j^{\text{odd}}$ giving

$$\exp(j\theta) = \sum\limits_{n=0}^{\infty} \frac{j^{2n}\theta^{2n}}{(2n)!} + \sum\limits_{n=0}^{\infty}\frac{j^{2n+1}\theta^{2n+1}}{(2n+1)!} = \sum\limits_{n=0}^{\infty} \frac{\theta^{2n}}{(2n)!}+j\sum\limits_{n=0}^{\infty} \frac{\theta^{2n+1}}{(2n+1)!}. $$

For those familiar with their hyperbolic trigonometric functions, the above can easily be recognized as

$$ \exp(j\theta) = \cosh(\theta) + j\sinh(\theta). $$

We would then like to represent any $z\in \Bbb H$ (with exception maybe to the hyperbolic numbers on the light cone since they have modulus $0$) in the form $r\exp(j\theta)$. Note: $\theta$ no longer represents an angle but is more like a parameter along the hyperbola and now varies from $-\infty$ to $\infty$, and since $r$ is a radius, it must be strictly positive. In the complex number case, $r$ was related to the geometry of the complex numbers, i.e. the level curves of $zz^*$ (circles).

We wish the same to hold for the hyperbolic case, giving that $r$ is given by the level curves of $zz^*$ (hyperbolas). If $z=x+jy$, then define $r$ to be $\sqrt{|x^2-y^2|}$ so that it is always positive. Knowing $r$ is not enough to tell us where our point $z$ lies in the hyperbolic plane since it could like on any one of four hyperbolas and could be anywhere on each of them so we must figure out which it lies on and where it is on the one it lies on. We have two cases (with two subcases each) to consider: $|x| > |y|$ (with subcases $x>y$ and $x<y$) and $|y| > |x|$ (with subcases $y>x$ and $y < x$). In each case, $\theta$ can be determined from the following: $x+jy = r(\cosh(\theta)+j\sinh(\theta))$, or $x=r\cosh(\theta)$ and $y = r\sinh(\theta)$. Therefore $\tanh(\theta) = \dfrac{y}{x}$ or $\theta = \tanh^{-1}\left(\frac{y}{x}\right)$.


Here is a plot for hyperbolas (level curves of zz* in the hyperbolic plane.
Notice that the light cone is shown in red. Courtesy of Wikipedia.


Case 1: $|x| > |y|$ and $x > y$. This implies that $x$ is positive and $y$ is between the lines $y = \pm\,x$ and so $z$ lies on the right-opening hyperbola seen in the image above.

Case 2: $|x| > |y|$ and $x < y$. This implies that $x$ is negative and $y$ is between the lines $y = \pm\,x$ and so $z$ lies on the left-opening hyperbola. One can easily see how the other two cases pan out.

We now have a complete hyperbolic-polar description of any $z\in\Bbb H$. Since 1+1 dimensional spacetime can be viewed with the hyperbolic numbers, it is natural to ask what calculus looks like on such a space to see if we can uncover any truths about nature.

Calculus on $\Bbb H$

The study of calculus on $\Bbb H$ will be very similar in methodology to that of the study of calculus on $\Bbb C$. Suppose we have a hyperbolic function $f:\Bbb H\to\Bbb H$ (again, I should restrict myself to sets in $\Bbb H$, but this will do for simplicity's sake), then we write $f(z) = f(x+jy) = f(x,y)$. $f$ associates each $(x,y)\in \Bbb H$ with a pair $(x',y')\in \Bbb H$ (here I'm using the notion that each $x+jy$ can be viewed as a coordinate pair $(x,y)$ in the plane). We can then write $f(x,y) = u(x,y)+jv(x,y)$, where $u$ associates $(x,y)$ with $x'$ and $v$ associates $(x,y)$ with $y'$.

Since we can divide by hyperbolic numbers, we can potentially come up with a notion of differentiability similar to that with complex-differentiable functions (as long as we don't divide by $0$ of course). Let us try to apply the standard definition of a derivative from elementary calculus to $f$ and see what kind of coupling there is between our $u$ and $v$. From the definition of a derivative we know that if

$$ \lim_{\Delta z\rightarrow 0}\frac{f(z+\Delta z)-f(z)}{\Delta z} $$

exists, then $f$ is differentiable at $z$. Let us again look at the limit as we approach $z$ along horizontal and vertical lines. If we approach along a horizontal line, $\Delta z = \Delta x$, and if we approach along a vertical line, $\Delta z = j\Delta y$. We then have two expressions for the derivative:

$$ \lim_{\Delta z\rightarrow 0}\frac{f(z+\Delta z) - f(z)}{\Delta z} = \lim_{\Delta x\rightarrow 0}\frac{u(x+\Delta x,y)-u(x,y)+jv(x+\Delta x,y)-jv(x,y)}{\Delta x} $$

along horizontal lines and

$$ \lim_{\Delta z\rightarrow 0} \frac{f(z+\Delta z)-f(z)}{\Delta z} = \lim_{\Delta y\rightarrow 0}\frac{u(x,y+\Delta y)-u(x,y)+jv(x,y+\Delta y)-jv(x,y)}{j\Delta y} $$

along vertical lines. Like in the $\Bbb R^2$ case, we require that if a derivative exists at a point, the value must be the same along every path to the point and so the two expressions above must be equal if $f$ is to be differentiable at $z$. We can recognize the above expressions as the partial derivatives of $u$ and $v$ along the $x$ and $y$ directions. Since the expressions must be equal we have that

$$ \frac{\partial u}{\partial x} + j\frac{\partial v}{\partial x} = j\frac{\partial u}{\partial y} + \frac{\partial v}{\partial y} $$

since $j^{-1} = j$. If we equate the "real" and "imaginary" parts we have that

$$ \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y} $$

and

$$ \frac{\partial u}{\partial y} = \frac{\partial v}{\partial x}. $$

These two equations are similar to the Cauchy-Riemann equations for complex-differentiable functions. If we took the partial derivative of the first equation with respect to $y$ and the second with respect to $x$ and subtracted them, we would see that $v$ satisfies the following partial differential equation

$$ \frac{\partial^2 v}{\partial x^2} - \frac{\partial^2 v}{\partial y^2} = 0 $$

assuming that u is continuously twice differentiable. Similarly, u solves the following equation

$$ \frac{\partial^2 u}{\partial x^2} - \frac{\partial^2 u}{\partial y^2} = 0 . $$

Since both $u$ and $v$ solve this equation, so does $f$ since differentiation is a linear operator. This partial differential equation is very important though you may not recognize it at first glance. Implicitly we have been treating $x$ and $y$ on the same footing and if we wish to relate them back to physical quantities, we must recognize that if $x$ is a coordinate variable (have units of length), then $y$ must also be a coordinate variable (have units of length). If we make the change of variables $x = ct$, then we have that $f$ solves the following partial differential equation

$$ \frac{1}{c^2}\frac{\partial^2 f}{\partial t^2} - \frac{\partial^2 f}{\partial y^2} = 0 $$

which is the wave equation in 1+1 dimensions. So we see that differentiable functions on $\Bbb H$ must solve the wave equation. The wave equation is fundamental to special relativity because it governs the behavior of light as was recognized by James Clerk Maxwell.

Connections to Einstein's Theory of Special Relativity

Thus far we have seen some connections to special relativity: hyperbolic numbers of the form $x\pm jx$ denote the light cone of special relativity and differentiable functions on $\Bbb H$ solve the wave equation (and are thusly related to light in some fashion). Can we uncover any more physics by working with the hyperbolic numbers? It turns out we can!

Let us define a "rotation" in $\Bbb H$ of a hyperbolic number $z$ by "angle" $\varphi = \tanh^{-1}\left(\dfrac{v}{c}\right)$ to be given by $z\exp(j\theta)$ (we have implicitly assumed that  $\left|\dfrac{v}{c}\right| < 1$ since that is the domain of $\tanh^{-1}$ - this assumption will be important physically). What this does physically to $z$ is moves $z$ along the hyperbola it lies on by an "angle" $\varphi$. If we write $z = ct+jx$, can we represent $z' = z\exp(j\varphi)$ in terms of $v$, $t$, and $x$?

If we write $\exp(j\varphi)$ as $\cosh(\varphi)+j\sinh(\varphi)$, we have that $\cosh(\varphi) = \dfrac{1}{\sqrt{1-\left(\frac{v}{c}\right)^2}}\,\,$ and $\,\,\sinh(\varphi) = \dfrac{\frac{v}{c}}{\sqrt{1-\left(\frac{v}{c}\right)^2}}$. For simplicity, I will define $\gamma$ to be $\dfrac{1}{\sqrt{1-\left(\frac{v}{c}\right)^2}}$. Therefore

$$z' = ct' + jx' = (ct+jx)\left(\gamma + j\gamma\dfrac{v}{c}\right) = \gamma\left(ct+x\dfrac{v}{c}\right)+\gamma\left(x+\frac{t}{c}\right)$$

and so $t' = \gamma\left(t+\frac{xv}{c^2}\right)$ and $x' = \gamma\left(t+\frac{xv}{c^2}\right)$. These are just the standard Lorentz transformations between two inertial reference frames and so it shows that Lorentz transformations are just rotations in the hyperbolic plane.

The interpretation of this is that if we compare our frame of reference to a frame of reference that is moving with uniform velocity relative to us (with $v < c$ since $\tanh^{-1}$ cannot be defined in the usual sense if $v$ is greater than or equal to $c$), then the way to relate our coordinates is just the Lorentz transformations and that we lie on the same hyperbola in the hyperbolic plane (since one is just a rotation of the other).

There is more that could be said about the connections between the hyperbolic numbers and special relativity but I will stop here. The above notion can be generalized to $3+1$ dimensions ($3$ spatial dimensions, $1$ time dimension) by including two more "imaginary" units like $j$ and these would be the place-keepers for the spatial components. In $3+1$ dimensions, the light cone makes up a cone embedded in $4$ dimensional space and the level curves of $zz^*$ are now hyperboloids but the ideas carry over somewhat naturally.

Saturday, October 20, 2012

Complex Analysis and Real Analysis on $\Bbb R^2$


(Note: In this post I will refer to the complex numbers by the set $\Bbb C$ and real numbers by $\Bbb R$. $\Bbb R^n$ is merely the Cartesian product of $\Bbb R$ with itself $n$ times (which is the same as vectors with $n$ components). If $n=1$, this is simply the real line; if $n=2$, this is the $xy$ plane; and if $n=3$, this is three dimensional space as we know it.)

In my last post I discussed the complex numbers and I felt this was a good time to segue into complex analysis and how it differs from real analysis on $\Bbb R^2$. It will turn out that because there is a well-defined notion of multiplication of complex numbers, there is a well-defined notion of division of complex numbers, and this fact is responsible for why calculus on $\Bbb C$ is so different from calculus on $\Bbb R^2$ even though they both represent the $xy$ plane. Before I delve into complex analysis, I will first explore calculus on $\Bbb R^2$.

Calculus on $\Bbb R^2$

(Side note: In the upcoming sections I should say that the functions of interest are defined on subsets of $\Bbb R^n$ and $\Bbb C$ but I will omit this technicality except in the case of functions on $\Bbb R$ because it only makes definitions longer without adding much to the actual concept.)

Firstly, "calculus on $\Bbb R^2$" is a bit ambiguous since one can consider functions from $\Bbb R^2$ to $\Bbb R$, $\Bbb R^2$ to $\Bbb C$ or $\Bbb R^2$ to $\Bbb R^n$. Examples of such functions include $f(x,y) = x$, $g(x,y) = x+iy$ and $h(x,y) = (x,0,\ldots,0)$, respectively. I will restrict my discussion to $\Bbb R^2$ since complex functions take $\Bbb C$ to $\Bbb C$ and the analogy is best served in this light. Since this post is about calculus on certain spaces, we need to explore topics in calculus. The simplest one to consider is, of course, limits, but limits behave similarly in $\Bbb R^2$ and $\Bbb C$ since they "look the same" (effectively have the same notion of distance). The next layer of abstraction would then be differentiation and this is where we begin to see differences between real and complex analysis.

We wish to consider differentiable functions from $\Bbb R^2$ to $\Bbb R^2$ but what is a derivative between these two sets? What does it look like? It turns out that derivatives of functions from $\Bbb R^n$ to $\Bbb R^m$ are represented $m\times n$ matrices. To develop the notion of a derivative of functions from $\Bbb R^n$ to $\Bbb R^m$, one must reconsider what a derivative is. The reason being that the regular definition of a derivative does not carry over exactly as it is defined in many introductory calculus courses.

The standard definition of a derivative (in introductory calculus courses) follows. A function $f:[a,b]\to\Bbb R$ has a derivative at some point $c\in(a,b)$ if

$$ \lim_{h\rightarrow 0} \frac{f(c+h)-f(c)}{h} $$

exists. This quantity is called the derivative of $f$ at $c$ and is denotied $f'(c)$. We would like to adapt this definition for the case of functions from $\Bbb R^n$ to $\Bbb R^m$ so let's attempt to do so.

Suppose $f:\Bbb R^n\to\Bbb R^m$. Let's try to define the derivative like we do above. Let $\vec{x},\vec{h}\in \Bbb R^n$, then if we haphazardly apply the previous definition we have

$$ \lim_{\vec{h}\rightarrow \vec{0}} \frac{f(\vec{x}+\vec{h})-f(\vec{x})}{\vec{h}}. $$

However this isn't well defined. The numerator makes sense, but it doesn't make sense to divide by a vector. This harks back to my point in the introduction to this post. We cannot divide by a vector so this notion of differentiation cannot possibly work out. Now we could define some notion of multiplication of vectors so that we can define division by vectors (or rather, the inverse of a vector) but it is entirely non-unique and only makes the matter entirely more complicated.

Remark: The first definition of a derivative is a limit and inherently when working with limits, one needs have a notion of "length" (really one needs a distance, but if one can speak about lengths of vectors, one can speak about distances between vectors). That is, we need to be able to say vectors are getting close to one another. There are many possible definitions, but the one I will stick to is the most familiar to you and that is the Euclidean distance. In $\Bbb R^n$ the Euclidean length of a vector $\vec{x}$ is given by

$$ \|\vec{x}\| = \left(\sum_{i=1}^n x_i^2\right)^{\frac{1}{2}}, $$

where the $x_i$ are the components of $\vec{x}$. This can be thought of as the familiar Pythagorean theorem generalized to $n$ dimensions. Now when doing limits in $\Bbb R^n$, the notion of distance between vectors $\vec{x}$ and $\vec{y}$ will be given by $\|\vec{x}-\vec{y}\|$, so we say that a sequence of vectors $\vec{x}_m$ has a limit vector of $\vec{L}$ if $\|\vec{x}_m-\vec{L}\|$ goes to $0$, meaning our vectors get close to $\vec{L}$.

With this machinery we can start discussing derivatives of functions from $\Bbb R^n$ to $\Bbb R^m$. One important property of the derivative on $\Bbb R$ is that it linearizes a function, that is to say that if we have a function $f$ that is differentiable at $x$, then $f(x+a)\approx f(x)+f'(x)a$, when $a$ is small. This property is very powerful because it allows us to get approximate values of a function based on knowing its derivative and value at a point. If we are to generalize this idea to work on $\Bbb R^n$ (to $\Bbb R^m$), we will need that $f'(x)$ is an $m\times n$ matrix so that multiplication by a vector in $\Bbb R^n$ makes sense (analogous to multiplying by $a$ above) and gives a vector in $\Bbb R^m$. (If you recall from multivariable calculus, this matrix is the Jacobian.)

To formalize the above approximation, what one can say is that $f(x+a)-f(x)-f'(x)a = O(a^2)$. That is to say that it has powers of $a^2$ and higher so that when $a$ is small, these terms are negligible and go to $0$ when $a$ goes to $0$. If we massage this a little we get the following definition for differentiability (which is equivalent to the first definition above):

A function $f:[a,b]\to\Bbb R$ is differentiable at $c\in(a,b)$ if there exists $L\in\Bbb R$ so that

$$ \lim_{h\rightarrow 0} \frac{f(c+h)-f(c)-Lh}{h} = 0 ,$$

where $L$ is called the derivative of $f$ and is denoted $f'(c)$. It turns out that this definition carries over very well to the case of functions from $\Bbb R^n$ to $\Bbb R^m$ and the definition is as follows. A function $f:\Bbb R^n\to\Bbb R^m$ is differentiable at $\vec{x}\in\Bbb R^n$ if there exists an $m\times n$ matrix $L$ such that

$$ \lim_{\|\vec{h}\|\rightarrow 0} \frac{\|f(\vec{x}+\vec{h})-f(\vec{x})-L\vec{h}\|}{\|\vec{h}\|} = 0 ,$$

where $L$ is called the derivative of $f$ and is denoted $f'(\vec{x})$. In this definition you can see the pieces we put together: dividing by $\|\vec{h}\|$ instead of $\vec{h}$ and the linearization behavior of the derivative. The reason we take the norm of the numerator is so that we don't have to worry about it when we do the limit (though it wasn't entirely necessary since we need to speak of lengths anyway when doing limits). And now we have a proper notion of differentiation of functions from $\Bbb R^n$ to $\Bbb R^m$ (it turns out that this definition of a derivative can be generalized to functions from more abstract spaces - like function spaces so you'd be taking derivatives of functions of functions). In the case of functions from $\Bbb R^2$ to $\Bbb R^2$, the derivative is a $2\times 2$ matrix. It will turn out that derivatives of functions from $\Bbb C$ to $\Bbb C$ are much different due to the ability to divide and multiply complex numbers.

Calculus on $\Bbb C$

A function $f$ on $\Bbb C$ can be written as $f(z) = f(x+iy)$ (recall from my last post that we write $z=x+iy$). This will, in general, assign each $(x,y)\in\Bbb C$ to a pair $(x',y')\in\Bbb C$ so we will write $f(x,y) = u(x,y)+iv(x,y)$. We say that $u$ is the real part of $f$ and $v$ is the imaginary part of $f$, that is to say that $u$ associates $(x,y)$ with $x'$ and $v$ associates $(x,y)$ with $y'$. Like above, we would like to define a length of complex numbers. Since $\Bbb C$ and $\Bbb R^2$ both have the same structure with respect to addition, they both look the same (meaning we associate each complex number $x=iy$ with an ordered pair $(x,y)$ in the plane $\Bbb R^2$). In this way you can kind of view complex numbers as two-dimensional vectors with one difference: we can multiply and divide by complex numbers. Since $\Bbb C$ can be viewed as a plane of points and can associate $x+iy$ with the vector $(x,y)$, we can speak of the length of a complex number.

By inspection we see that the length of $x+iy$ (denoted $|x+iy|$ - this is called the modulus or norm of $x+iy$) is $\sqrt{x^2+y^2}$. It turns out that we can write this as $\sqrt{(x+iy)(x-iy)}$ (check this for yourself). The quantity $x-iy$ looks very similar to $x+iy$ but with the sign of $iy$ changed, and this is in fact the complex conjugate of $x+iy$ (the complex conjugate changes the sign of the term with $i$ in it). If we write $z=x+iy$, then we write $z^*=x-iy$ and call $z^*$ the complex conjugate of $z$. Other common notation is $\bar{z}$ for the complex conjugate. Then we write that $|z|^2 = zz^* = x^2+y^2$.

Since we can multiply complex numbers, we should be able to divide by complex numbers (with the exception to $0$ since dividing by $0$ is not well-defined). Let $z,z'$ be complex numbers with $z$ given. If we want to find what the inverse of $z$ is, we want to solve $zz' = 1$ for $z'$. If we multiply both sides by $z^*$, we get that $(x^2+y^2)z' = z^*$ so that $z' = \dfrac{z^*}{x^2+y^2}$ and so we can take inverses of complex numbers that aren't $0$! This fact becomes very important when considering derivatives of functions from $\Bbb C$ to $\Bbb C$.

Now we have built up all of the necessary machinery for talking about derivatives of functions from $\Bbb C$ to $\Bbb C$. Let us try to apply the initial definition of a limit and see if there are any difficulties like the real variable case.

Suppose $f:\Bbb C\to\Bbb C$. Let us haphazardly use the definition of a derivative from the $\Bbb R$ to $\Bbb R$ case (with minor changes) and see if anything goes wrong. We then wish to look at the following

$$ \lim_{\Delta z\rightarrow 0} \frac{f(z+\Delta z)-f(z)}{\Delta z}. $$

Since we can add, subtract and divide complex numbers (as long as they are never zero), we can make sense of this limit unlike in the real variable ($\Bbb R^2$) case! There is no need to talk about dividing by the lengths of complex numbers (norms) in this case as a result. Since this quotient makes sense, we can define the derivative of a complex function with it.

Let $f:\Bbb C\to\Bbb C$. It is said to be complex differentiable at $z$ if

$$ \lim_{\Delta z\rightarrow 0} \frac{f(z+\Delta z)-f(z)}{\Delta z} $$

exists. We call this the derivative of $f$ at $z$ and denote it by $f'(z)$.

Recall from multivariable real analysis that if a limit of a function exists at a particular point, the value should come out the same regardless of how we approach that point. Making that same restriction here, we can come up with equations that couple the $u$ and $v$. More specifically, if we approach $z$ along a horizontal and vertical line, we should get the same result for the derivative.

Let's assume $f$ is complex differentiable and see what comes of it. Along horizontal lines lines, $z' = x+iy'$, where $y'$ is constant, and along vertical lines, $z' = x'+iy$, where $x'$ is constant. Therefore, along horizontal lines, $\Delta z' = \Delta x$; along vertical lines, $\Delta z' = i\Delta y$. Hence our expression for the derivative becomes (along a horizontal line)

$$ \lim_{\Delta x\rightarrow 0} \frac{f(x+iy+\Delta x) - f(x+iy)}{\Delta x} = \lim_{\Delta x\rightarrow 0} \frac{u(x+\Delta x, y) + iv(x+\Delta x,y) - u(x, y) - iv(x,y)}{\Delta x}. $$

Along a vertical line, our expression for the derivative becomes

$$ \lim_{\Delta y\rightarrow 0} \frac{f(x+iy+\Delta y) - f(x+iy)}{\Delta y} = \lim_{\Delta y\rightarrow 0} \frac{u(x,y+\Delta y) +iv(x, y+\Delta y) - u(x,y) - iv(x,y)}{i\Delta y} .$$

We can recognize the limit along $x$ direction as the derivative of $u$ and $v$ with respect to $x$, giving

$$ f '(z) = \frac{\partial u}{\partial x} + i\frac{\partial v}{\partial x} $$

and we can recognize the limit along the $y$ direction as the derivative of $u$ and $v$ with respect to $y$, giving

$$ f '(z) = -i\frac{\partial u}{\partial y} + \frac{\partial v}{\partial y} .$$

Equating the real and imaginary parts in these two expressions leads to the following relations

$$ \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y} $$

and

$$ \frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x} . $$

These are known as the Cauchy-Riemann equations. It turns out that these equations imply that $u$ and $v$ are both harmonic functions and a lot of analysis of such functions has been done (maximum principle, for example). These two seemingly innocuous equations are at the heart of why complex analysis is so different from real analysis. In the real case, there is no such coupling between the components of the function (if you separate them similarly to what we did here) and this is simply because we could divide complex numbers.

It turns out that if a complex function is complex differentiable on an open set, then it is infinitely differentiable on that open set. This shows that complex differentiability is much stronger than ordinary differentiability. One particular oddity of complex analysis is that if one has a complex function that is differentiable everywhere (this is referred to as entirety), then it is either the constant function or it must necessarily not be bounded. Contrast this with the real function case: $\sin(xy)$ is real differentiable everywhere and is bounded, but is not the constant function. Complex analysis is rife with counter-intuitive results such as this. Another example is that if one has an entire function whose image omits two (or more) points in $\Bbb C$, then it must be constant (this is called Picard's little theorem).

So despite the vast similarities between $\Bbb R^2$ and $\Bbb C$, defining multiplication of complex numbers vastly changes the landscape of calculus between the two spaces as is evidenced by the Cauchy-Riemann equations and other results. The beauty of complex analysis truly stems from the added structure of multiplication of complex numbers, without a notion of multiplication calculus on $\Bbb C$ would be the same as calculus on $\Bbb R^2$. There is a lot more that could be said but I feel this is a good place to stop, especially since the post got away from me again and ended up being much longer than I intended.

Saturday, October 13, 2012

The Complex Numbers, Quaternions and Split-Complex Numbers

Recently it was asked on /r/math on reddit what the quaternions were and where they came from and I figured this was a great opportunity to write a blog post. I will briefly cover what complex numbers are and then jump into the inception of the quaternions. I will also have a brief discussion at the end on the split-complex (or hyperbolic) numbers since they are related to the complex numbers and so are related to the quaternions.

The complex numbers arose in the context of solving polynomials of degree two and higher, specifically when solving equations like $x^2+1=0$ or, when written in a slightly more tangible form, $x^2=-1$. (Note: historically, complex numbers really saw their inception in cubic equations by Cardano and others.) It is easy to see that no such real valued $x$ can solve this equation because we know that when $x\in\Bbb R$, $x^2\ge 0$. (This proof is fairly straightforward using basic axioms of real numbers so I will not do it here.) Despite this fact, mathematicians wanted some way to factorize - solve - the above equation. Enter the imaginary number $i$. It is clear from the above argument that $i$ is most definitely not a real number.

Undoubtedly you have seen this mathematical object in some way, shape or form. $i$ was defined to solve exactly such an equation, i.e. $i^2=-1$. Why that particular equation? Why not $ i^2 = -2 $? The answer is by analogy: we know that the solution to $ x^2 = 1 $ is $1$ and $-1$, which are fundamental units of the real numbers in a way. For the mathematically mature readers, $1$ is the unity for the ring of real numbers. Intuitively, this suggests that solving $ x^2 = -1 $ should give us an element for a new set of numbers that plays a similar role to $1$ for the real numbers in some way. Additionally, if we looked at a definition not of the from $ i^2 = -1 $, we would have to worry about factors and it would make the construct that much more taxing. This new set of numbers, deemed the "imaginary" numbers (I abhor this moniker, but it is what we are stuck with historically so I will use it), are numbers of the form $ ai$, where a is a real number, i.e. rescalings of the unit $ i $, just like with $1$ in the real number case.

There are some issues with the set of imaginary numbers; namely, it is not closed under multiplication, i.e. if you multiply two imaginary numbers you do not get an imaginary number. In fact, if you multiply two imaginary numbers together you get a real number. (Check it yourself!) If we wish to create a set that includes the imaginary numbers that is closed under multiplication, we must include the real numbers and doing so gives us the complex numbers. A complex number $ z $ is of the form $ z = a + bi $, where $ a, b \in \mathbb{R} $, and $ a $ is called the real part of $ z $ and $ b $ is called the imaginary part of $ z $. It turns out that the product of two complex numbers is a complex number and the sum of two complex numbers is a complex number.

In fact, the complex numbers make a field. Since one can multiply complex numbers, one can also divide them. This fact is the reason for why complex analysis is vastly different from calculus on $ \mathbb{R}^2 $ (I will make a brief post about this in the near future because it is worthwhile to discuss). Thus the complex numbers are very nice and it turns out that every polynomial equation has factors that are complex numbers. This is the fundamental theorem of algebra and it can be proved in a wealth of ways using complex analytic techniques. If this were the only use of complex numbers, it might not warrant an attempt at generalizing them but it is only one of the many uses of complex numbers, and, more generally, and complex analysis. Complex analysis is a very beautiful subject has a wealth of utility in mathematics and physics and therefore it is very natural is to look at how complex numbers can possibly be generalized to develop a more general analysis. The logic being that perhaps a more general structure will be as beautiful and useful, if not more so. This generalization is exactly the quaternions.

The quaternions sprung out of an idea that Sir William Rowan Hamilton had. He was one of the many people that worked on reformulating classical mechanics with the notion of least action; namely, he worked on translating Lagrangian mechanics into an energy-based formulation and this reformulation mechanics bears his name. I will not refer to the Cayley-Dickson construction when developing the quaternions because I feel like one loses the insight into their development. The Cayley-Dickson construction came after and is used to further generalize the quaternions to what is known as the octonions and sedenions.

Initially what Hamilton set out to do was to generalize the complex numbers to three dimensions. Going from two dimensions (the complex plane) to three dimensions seems completely reasonable. There are two ways to do this: introduce another real number (think $(1,1,i)$ in "vector" notation) or introduce another imaginary number (think $(1,i,j)$ in "vector" notation). The first is a bit underwhelming and it doesn't capture the true nature of what Hamilton wanted to do. What he really desired was to have two imaginary units $ i $ and $ j $ so that any number in this new vector space would be written as $ z = a + bi + cj $, where $ a, b, c \in \mathbb{R} $. Further, Hamilton wanted $ i $ and $ j $ to be completely independent quantities that satisfied $ i^2 = j^2 = -1 $. In order to have a proper generalization of complex numbers, we would like to be able to multiply two of these three-tuples. If we were to multiply two of these three-tuples we would have the following:

$$ z_1z_2 = (a_1 + b_1i + c_1j)(a_2 + b_2i + c_2j).$$

Assuming the distributive property and associativity of multiplication we get the following expression for $ z_1 z_2$:

$$ z_1z_2 = (a_1a_2 - b_1b_2 - c_1c_2) + (a_2b_1 + a_1b_2)i + (a_1c_2+a_2c_1)j + b_1c_2ij + c_1b_2ji. $$

The problem becomes how does one then define the product $ ij $ and likewise $ ji $? If we want this space to be closed, we need both $ ij $ and $ ji $ to either be a real number, $i$ or $j$ or some linear combination thereof. Let's assume that $ ij = a + bi + cj $ and see if there are any glaring issues with this.

Notice that I did not assume that $ij = ji$ above because the multiplication may not be commutative. Let's take the previous expression and multiply on the left by $i$ and see what we get. $ iij = -j = -b + ai + cij $. However we ended up with $ ij $ again so let us substitute our expression for $ ij $ into the previous expression to get $ -j = (ac - b) + (a + bc)i + cj $. Equating coefficients we have that $c = -1$, which forces $a+b = 0$ and $a-b = 0$ which implies that both $a$ and $b$ are $0$. This tells us that $ij = -j$. However if we multiply by $j$ on the right on both sides of the equation we have that $i = -1$ which cannot possibly be true. As I am sure you have guessed, trying to define the product $ji$ will also lead to contradictions. So we see that within this framework, multiplication of these three-tuples causes contradictions.

What then can be done to salvage this idea? Hamilton eventually came to the conclusion that a third - yes a third - imaginary number $k$ needs to be added to the set. The constraints on the imaginary numbers are thusly: $ i^2 = j^2 = k^2 = -1 $ and $ ijk = -1 $, with the requirement that $i$, $j$ and $k$ be independent imaginary numbers. We also require that multiplication be associative and that the distributive property holds. From this one can easily see that $ij = k$, $jk = i$ and $ki = j$. It can also be shown that the imaginary numbers anti-commute, i.e. $ij + ji = 0$ (and so on). With these definitions, we have established a space that is closed under addition and multiplication.

The notation $i$, $j$ and $k$ seems very suggestive and may remind you of vectors in $ \mathbb{R}^3 $. In fact, vector calculus sprung out of quaternions. Hamilton battled fiercely to keep quaternions relevant in mathematics but eventually vector notation would dominate. However, quaternions would subtlely reemerge in a way in 1928. If you notice above, we have the cyclic relation that $ij = k$, $jk = i$ and $ki = j$. These are actually the right hand rule for cross products and so cross products can be represented with quaternions.

If $ (x_1, y_1, z_1), (x_2, y_2, z_2) $ are vectors in $ \mathbb{R}^3 $, then their cross product is given by

$$ (y_1z_2 - y_2z_1, -x_1z_2 + x_2z_1, x_1y_2 - x_2y_1).$$

If we rewrite these as quaternions with $0$ real component, associate the first component with $i$, second with $j$ and third with $k$ and multiply them we have

$$(-x_1x_2 - y_1y_2 - z_1z_2) + (y_1z_2-y_2z_1)i + (-x_1z_2 + x_2z_1)j + (x_1y_2 - x_2y_1)k.$$

If the real part is ignored, the cross product formula is exactly recreated. The keen observer might also note that the dot product is embedded in the real part (with a $-$ sign thrown in). It then becomes clear that quaternions and vectors in $ \mathbb{R}^3$ are closely-related.

Quaternions may seem a bit silly and cumbersome and merely mathematical toys but they are very useful in doing rotations with graphics. Using regular matrices, gimbal lock arises but - for a reason unbeknownst to me - quaternions allow one to get around this issue with ease.

Quaternions also arise when one considers the Klein-Gordon equation in relativistic quantum mechanics. The Klein-Gordon equation was the first relativistic wave equation that was derived - in fact, Schrodinger first derived it but abandoned it for the equation that now bears his name - but it appeared to have some deep philosophical issues. Namely, it appeared to allow for negative probabilities. This presumed issue arose from the fact that there exists a second derivative with respect to time in the equation. Paul Dirac identified this "issue" and sought out a solution. His solution would be to "factor" the Klein-Gordon equation into a product of operators acting on what are now referred to as spinors (which is the constructive way to arrive at spin states in quantum mechanics). In doing so, he ended up with matrices that satisfied a very specific set of identities, namely the ones satisfied by the quaternions (up to a scale factor). His equation would come to be known as the Dirac equation. In this context we say that the Pauli matrices are isomorphic to the quaternions because they have the same structure with respect to multiplication after some manipulation.

The quaternions are a very rich structure which is somewhat unappreciated by mathematicians and physicists alike. However they receive more notice than the split-complex numbers. The split-complex (or hyperbolic) numbers come from considering the equation $x^2 = 1 $ and finding solutions to it. Of course $1$ and $-1$ are solutions but it is posited that there exists another solution $j$ such that $j^2 = 1$ and $j$ is neither $1$ or $-1$ ($j$ is also not complex in general because the only complex numbers you can square to get $1$ are $-1$ and $1$). The split-complex numbers can be used to represent Minkowski spacetime in $1+1$ dimensions. These can also be generalized further to include three hyperbolic numbers (called the split-quaternions) and can be used to describe Minkowski spacetime in $3+1$ dimensions - three spacial dimensions and one time dimension.

This ends the post. The brief digression into the hyperbolic numbers was merely to expose the reader to them because it is a very strange and cool idea, just like the complex numbers. While the complex numbers, quaternions, hyperbolic numbers and split-quaternions are rich objects in abstract algebra, they each find their way into physics as a means to describe the world around us. There is much more that could be said about them in the language of abstract algebra but it can become overly dense and dry for the casual reader and so I shall end the discussion here. My next post will be a shorter one - I hope - about the nature of calculus on $ \mathbb{R}^2 $ and why it is so different from calculus on $\mathbb{C} $.