## Beware the “natural” quaternion

Introduction

Rotation math can be confusing. But it didn’t need to be this confusing.

I think the reason that 3D rotations can be tricky to work with is that there are so many choices of convention– in interpretation, notation, and implementation– and we often do not communicate or document all of those choices clearly and completely. Are we “actively” rotating vectors, or “passively” rotating frames? If the latter, are we rotating from the “body” to “world” frame, or from world to body? Are rotations applied as we write them left to right, or right to left? Are the rotations right-handed or left-handed? Etc.

This post is motivated by lost and delayed productivity, stemming from confusion and miscommunication about one of these choices that seems not widely known: there are two different quaternions out there in the wild. The unit quaternion, familiar to most of us as a way to represent rotations in many application domains, has an evil twin. If you didn’t already know this, then hopefully this post is a helpful warning to double-check your documentation and code. But even if this is old news for you, then I will try to argue that use of this evil twin quaternion, advocated by Shuster [1] as the “natural” convention, was an unnecessary choice… and certainly not any more “natural,” “physical,” or otherwise inherently preferable to the quaternion most of us are familiar with.

Rotation matrices

To try to show more clearly how silly this situation is, let’s think like a software developer, working out how to implement rotations for some application. Let’s start with the interface first, then we will flesh out the implementation. We can view the concept of rotation as a group acting on $\mathbb{R}^3$, with a particular rotation represented as a group element $g$, that transforms a vector $v \in \mathbb{R}^3$ into a new rotated vector $g(v)$. For example, in Python, we could make a Rotation object callable:

class Rotation:
...

def __call__(self, v):
"""Rotate vector v."""
return ...

g = Rotation(...)
v = (1, 0, 0)
v_rotated = g(v)


But wait– we’ve already made several choices of convention here, at least implicitly. First, from this software implementation perspective, we are bypassing the discussion of interpretation altogether. That is, for example, whether the user wants to interpret an argument vector $v \in \mathbb{R}^3$ as being “actively” rotated within a fixed coordinate frame, or as a fixed point in space whose coordinates are being “passively” rotated from one frame to another, does not affect the eventual lines of code implementing the arithmetic operations realizing the rotation. (Or put another way, those interpretation details get pushed into the constructor, affecting how we want to interpret the parameters specifying creation of a Rotation. More on this later.)

However, we have made a definite choice about notation: namely, the left group action convention, where we write (from left to right) the rotation group element $g$ first, followed by the vector $v$ being acted upon.

This is okay; no one is complaining about this choice of convention. For example, if we now consider actually implementing a rotation as a 3×3 matrix $R$ (we’ll get to quaternions shortly), pretty much everyone agrees with and has settled on the well-established convention of multiplying the matrix $R$ on the left by the column vector $v$ on the right, so that the group action $g(v)$ corresponds to the matrix multiplication $Rv$:

import numpy as np

class Rotation:
def __init__(self, matrix):
"""Create rotation from matrix."""
self.r = np.array(matrix)

def __call__(self, v):
"""Rotate vector v."""
return self.r @ np.array(v) # matrix multiplication

g = Rotation(((0, -1, 0), (1, 0, 0), (0, 0, 1)))
v = (1, 0, 0)
v_rotated = g(v)


(As an aside, the only reasonably common application context I can think of where it’s more common to multiply a row vector on the left by a matrix on the right is a Markov chain, where we update the probability distribution as a row vector multiplied by the transition matrix. Maybe there are others that I’m not aware of?)

Composition of rotations

Our implementation isn’t quite complete. We still need to work out how to compose multiple rotations. Fortunately, there is again community-wide agreement on how this works with rotation matrices, and it’s consistent with the same left group action convention. Namely, composition (just like the group action on vectors) is denoted and implemented as the usual left-to-right matrix multiplication:

import numpy as np

class Rotation:
def __init__(self, matrix):
"""Create rotation from matrix."""
self.r = np.array(matrix)

def __call__(self, v):
"""Rotate vector v."""
return self.r @ np.array(v)

def __matmul__(self, r2):
"""Compose rotations (this @ r2)."""
return Rotation(self.r @ r2.r)

r1 = Rotation(((0, -1, 0), (1, 0, 0), (0, 0, 1)))
r2 = Rotation(((0, 0, 1), (0, 1, 0), (-1, 0, 0)))
v = (1, 0, 0)

v_rotated = (r1 @ r2)(v)
v_rotated = r1(r2(v)) # same result


Note a couple of observations before we move on. First, the example above highlights a potentially confusing side effect of this left group action convention: in the composition $R_1 R_2$, the rightmost rotation $R_2$ is the first to act on the input vector $v$.

Second, recall that matrix multiplication is defined so that each element in the matrix product $C=AB$ is computed as $c_{i,j}=\sum_k a_{i,k}b_{k,j}$. This is technically another arbitrary choice of convention; we could in principle choose to abandon this convention, and define “our” multiplication as $c_{i,j}=\sum_k a_{k,i}b_{j,k}$. The reader can verify that the effect of changing this convention is simply to reverse the order in which we would write a “historically conventional” well-defined matrix product.

Maybe we’ve been using the “wrong” definition of matrix multiplication for the last two centuries or so, and the above redefinition is more “natural”?

Keep the interface, change the implementation

Now that we have a working software implementation using rotation matrices, that corresponds nicely with cocktail-napkin notation, suppose that we want to change that implementation to use quaternions instead of matrices under the hood… but without changing the interface exposed to the user.

First, let’s review how unit quaternions work as rotation operators. Just as the group SO(3) of rotation matrices acts on vectors by multiplication, the group SU(2) of unit quaternions of the form $q=w+x\mathbf{i}+y\mathbf{j}+z\mathbf{k}$ acts on vectors by conjugation, with a unit quaternion $q$ transforming a vector $v$ into the rotated vector $qvq^{-1}=qvq^{*}$, where we temporarily convert vectors in $\mathbb{R}^3$ to and from corresponding quaternions with zero real part, and the non-commutative quaternion multiplication is defined by

$\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1$

Note that this form of conjugation, $qvq^{-1}$, is also a choice of convention. We could have used $q^{-1}vq$ instead… but here again, most everyone seems to agree the above is the more sensible choice, since it keeps us consistent with the left group action notational convention, allowing us to comfortably switch back and forth between thinking in terms of rotation matrices or quaternions as our underlying implementation.

Here is the result:

import numpy as np

def quat_product(q1, q2):
"""Return quaternion product (q1 q2)."""
w1, xyz1 = q1
w2, xyz2 = q2
return (w1 * w2 - np.dot(xyz1, xyz2),
w1 * xyz2 + w2 * xyz1 + np.cross(xyz1, xyz2))

class Rotation:
def __init__(self, quat):
"""Create rotation from quaternion."""
self.q = (quat[0], np.array(quat[1]))

def __call__(self, v):
"""Rotate vector v."""
w, xyz = quat_product(quat_product(self.q, (0, np.array(v))),
(self.q[0], -self.q[1]))
return xyz

def __matmul__(self, r2):
"""Compose rotations (this @ r2)."""
return Rotation(quat_product(self.q, r2.q))

c = np.sqrt(0.5)
r1 = Rotation((c, (0, 0, c)))
r2 = Rotation((c, (0, c, 0)))
v = (1, 0, 0)

v_rotated = (r1 @ r2)(v)
v_rotated = r1(r2(v)) # same result


Note that I’ve pulled out a helper function, quat_product, that multiplies two quaternions, using the product as defined above and originally set down by Hamilton nearly two centuries ago.

But this is yet another technically arbitrary choice of convention. There is another definition of the quaternion product, advocated by Shuster [1], and in wide use by the spacecraft dynamics community, that changes the product relations from the “Hamilton convention” of

$\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = \mathbf{ijk} = -1$

to what Sommer et. al. [2] call the “Shuster convention” (sometimes referred to as the “JPL convention,” denoted here as the operator $\otimes$) of

$\mathbf{i}^2 = \mathbf{j}^2 = \mathbf{k}^2 = -1, \mathbf{i}\otimes\mathbf{j}\otimes\mathbf{k} = +1$

Mathematically speaking, there is nothing wrong with this. (Side puzzle: this doesn’t contradict the Frobenius theorem, right?) However– and finally we come to the point of this discussion– there is also nothing more inherently Right, or Natural, about it, either.

The effect, and indeed the motivation, of using Shuster’s convention is simply to reverse the order of the quaternion product. That is, given two quaternions $q_1,q_2$ with Hamilton product denoted by juxtaposition $q_1 q_2$, the reader can verify that their “Shuster product” $q_1 \otimes q_2 = q_2 q_1$.

In the quat_product implementation above, we could realize the Shuster product by flipping the plus sign on the cross product to a minus sign… and that’s it. Nothing else changes… except, possibly, how we might choose to interpret the parameterization of our rotations in the constructor.

Homomorphism vs. antihomomorphism

So, why mess with two centuries of notational inertia just to write products in reverse order? To see what I think is the root of the problem, note that I cheated somewhat in the quaternion implementation of the Rotation class above. I claimed not to change the interface at all, but I did: the original rotation matrix implementation’s constructor expected a matrix argument, while the quaternion version expects a quaternion argument.

One way to fix this would be to keep the original expectation of a matrix as the constructor argument, and convert it to the corresponding quaternion under the hood. Or conversely, in the matrix implementation, we would accept a quaternion as the constructor argument, and convert it to the corresponding matrix.

But what do we mean by conversion to the corresponding matrix (or quaternion)? We want the external interface to remain unchanged, so that we preserve behavior as observed by the user, no matter which internal implementation we use. In mathematical terms, we need a homomorphism: a conversion function $\phi : SU(2) \to SO(3)$, with the property that for all unit quaternions $q_1, q_2$, converting them to rotation matrices preserves the behavior of composition, that is, $\phi(q_1 q_2)=\phi(q_1)\phi(q_2)$.

So, consider the situation at this point. We have two choices of convention to make:

1. What is the quaternion product: do we use Hamilton’s that has been around for nearly two centuries and is in common use in mathematical literature and software libraries, or Shuster’s that is mathematically equally workable but much more recent and much less widely known?
2. What is the conversion function $\phi$ described above?

The key observation is that, if we want our conversion function $\phi$ to be a homomorphism, then fixing the choice of convention in either (1) or (2) determines the choice in the other. And this whole mess, that on the surface is Shuster’s “natural” quaternion product as choice (1), really stems from the prematurely fixed, equally arbitrary choice (2) for the correspondence between quaternions and matrices.

In other words, the complaint seems to be, “I’ve already fixed my assumed conversion function for some reason, and I want it to be a homomorphism, so I need the quaternion product definition to change accordingly.” But if we simply transpose Shuster’s assumed quaternion-to-matrix conversion– which he refers to in equation (4) of [1], but does not actually define until equation (61), without ever noting the implicit conventional choice in doing so– then the Hamilton product does indeed “satisfy… the natural order” of multiplication as desired.

Conclusion

Unfortunately, the damage is done. That is, rants like this aren’t going to fix all of the papers and lines of code that already exist that use both conventions, so we can do no better than double-check assumptions and conventions whenever we encounter a new paper, software library, etc.

There are even examples of more intermediate weirdness, so to speak, to be on the lookout for. For example, MATLAB’s Robotics Toolbox is reasonably well-behaved, consistently using the Hamilton product (robotics.utils.internal.quatMultiply) and the corresponding homomorphic conversion functions (quat2rotm and rotm2quat)… but in the Aerospace Toolbox, despite also using the Hamilton product (quatmultiply), the analogous conversion functions (quat2dcm and dcm2quat) are seemingly antihomomorphic– i.e., quat2dcm(q) returns the transpose of quat2rotm(q)— except for the added wrinkle that the provided rotation function quatrotate uses the conjugation $q^{-1}vq$ consistent with the right group action convention (i.e., so that when composing rotations, matrix products look reversed compared to their corresponding quaternion products).

References:

1. Shuster, M., The nature of the quaternion, The Journal of the Astronautical Sciences, 56(3) 2008, 359–373
2. Sommer, Gilitschenski, Bloesch, Weiss, Siegwart, Nieto, Why and How to Avoid the Flipped Quaternion Multiplication, arXiv:1801.07478
This entry was posted in Uncategorized. Bookmark the permalink.

This site uses Akismet to reduce spam. Learn how your comment data is processed.