How to tell time using GPS, and the recent Collins glitch

Introduction

Last week on Sunday 9 June, hundreds of airplanes experienced a failure mode with some variants of the Collins Aerospace GPS-4000S sensor, grounding many business jets and even causing some delays among regional airlines.

Following is the initial description of the problem from Collins, as I first saw it quoted in Aviation International News:

“The root cause is a software design error that misinterprets GPS time updates. A ‘leap second’ event occurs once every 2.5 years within the U.S. Government GPS satellite almanac update. Our GPS-4000S (P/N 822-2189-100) and GLU-2100 (P/N 822-2532-100) software’s timing calculations have reacted to this leap second by not tracking satellites upon power-up and subsequently failing. A regularly scheduled almanac update with this ‘leap second’ was distributed by the U.S. government on 0:00 GMT Sunday, June 9, 2019, and the failures began to occur after this event.”

This sounded very suspicious to me… but suspicious in a mathematically interesting way, hence the motivation for this post.

The suggestion, re-interpreted and re-propagated in subsequent aviation news articles and comment threads, seems to be that the gummint introduced a “leap second event” in the recent GPS almanac update, and the Collins receivers weren’t expecting it. The “almanac” is a periodic, low-throughput message broadcast by the satellites in the GPS constellation, that newly powered-on receivers can use to get an initial “rough idea” of where they are and what time it is.

It is true that one of the fields in the almanac is a count of the number of leap seconds added to UTC time since the GPS epoch, that is, since the GPS “clock started ticking” on 6 January 1980 at 00:00:00 UTC. So what’s a leap second? Briefly, the earth’s rotation is slowing down, and so to keep our UTC clocks reasonably consistent with another astronomical time reference known as UT1, we periodically “delay” the advance of UTC time by introducing a leap second, which is an extra 61st second of the last minute of a month, historically either at the end of June or the end of December.

There have been 18 leap seconds added to UTC since 1980… but leap seconds are scheduled months in advance, and it has already been announced that there will not be a leap second at the end of this month, and there certainly wasn’t a leap second added this past June 9th.

So what really happened last week? The remainder of this post is purely speculation; I have no affiliation with Collins Aerospace nor any of its competitors, so I don’t have any knowledge of the actual software whose design was reported to be in error. This is just guesswork from a mathematician with an interest in aviation.

GPS week number roll-over

The Global Positioning System has an interesting problem: it’s hard to figure out what time it is. That is, if your GPS receiver wants to learn not just its location, but also the current time, without relying on or comparing with any external clock, then that is somewhere between impossible and difficult, depending on how fancy we want to get. The only “timestamp” that is broadcast by the satellites is a week number– indicating the integer number of weeks elapsed since the GPS epoch on 6 January 1980– and a number of seconds elapsed within the current week.

The problem is that the message field for the week number is only 10 bits long, meaning that we can only encode week 0 through week 1023. After that, on “week 1024,” the odometer rolls over, so to speak, back to indicating week 0 again.

This has already happened twice: the first roll-over was between 21 and 22 August 1999, and the second was just a couple of months ago, between 6 and 7 April 2019. An old GPS receiver whose software had not been updated to account for these roll-overs might show the time transition from 21 August 1999 “back” to 6 January 1980, for example.

It’s worth noting that those roll-overs didn’t actually occur exactly at midnight… or at least, not at midnight UTC. The GPS “clock” does not include leap seconds, but instead ticks happily along, always exactly 60x60x24x7=604,800 seconds per week. So, for example, GPS time is currently “ahead” of UTC time by 18 seconds, corresponding to the 18 leap seconds that have contributed to the “slowing” of the advance of the UTC clock. The GPS week number most recently rolled over on 6 April, not at midnight, but at 23:59:42 UTC.

Using the leap second count

It turns out that we could modify our GPS receiver software to extend its ability to tell time beyond a single 1024-week cycle, by using the leap second count field together with the rolling week number. The idea is that by predicting the addition of more leap seconds at reasonably regular intervals in the future, we can use the week number to determine the time within a 1024-week cycle, and use the leap second count to determine which 1024-week cycle we are in.

This is not a new idea; there is a good description of the approach here, in the form of a patent application by Trimble, a popular manufacturer of GPS receivers. Given the current week number 0 \leq W < 1024 and the leap second count L in the almanac, the suggested formula for the “absolute” number of weeks W' since GPS epoch is given by

W' = t + ((W-t+512)\mod 1024) - 512

t = \lfloor 84.56 + 70.535L \rfloor

where the intermediate value t is essentially an estimate of the week number obtained by a linear fit against the historical rate of introduction of leap seconds.

This formula was proposed in 1996, and it would indeed have worked well past the 1999 roll-over… but although it was predicted to “provide a solution to the GPS rollover problem for about 173 years,” unfortunately it would really only have lasted for about 12 years, first yielding an incorrect time in week 1654 on 18 September 2011.

The problem is shown in the figure below, indicating the number of leap seconds that have been added over time since the GPS epoch in 1980, with the red bars indicating the “week zero” epoch and roll-overs up to this point:

Leap seconds added to UTC time since GPS epoch, with GPS epoch and 1024-week roll-overs shown in red.

Right after the first roll-over in 1999, the reasonably regular introduction of leap seconds stopped, and even once they started to become “regular” again, they were regular at a lesser rate (although still more frequently than the “2.5 years” suggested by the Collins report).

Conclusion

Could something like this be the cause of this past week’s sensor failures? It’s certainly possible: it’s a relatively simple programming exercise to search for different linear fit coefficients in the above formula– a variant of which might have been used on these Collins receivers– that

  1. Yield the correct absolute week number for a longer time than the above formula, including continuing past the second roll-over this past April; but
  2. Yield the incorrect absolute week number, for the first time, on 9 June (i.e., the start of week 2057).

Such coefficients aren’t hard to find; for example, the reader can verify that the following satisfies the above criteria:

t = \lfloor -291 + 102L \rfloor

which corresponds to an estimate of one additional leap second approximately every 2 years.

Edit: See Steve Allen’s comment (and my reply) for a description of what I think is probably a more likely root cause of the problem– still related to interpreting the combination of week number roll-over and leap second occurrences, but with a slightly different failure mode.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to How to tell time using GPS, and the recent Collins glitch

  1. Steve Allen says:

    As of last week it had been 128 weeks since the previous leap second, and no next leap second is scheduled. GPS subframe 4 page 18 word 9 allots 8 bits for WN_LSF, the difference between the current GPS week number and the next/previous week number. So as of last week the difference no longer fits in the GPS navigation message. IS-GPS-200J describes that this can happen, and most receivers got it right, but Rockwell Collins failed at doing modulo arithmetic.

    • Interesting, thanks! If I understand correctly, the relevant part of IS-GPS-200J is section 20.3.3.5.2.4: “The CS shall manage these parameters such that, when [the previous and next leap second deltas] differ, the absolute value of the difference between the
      untruncated WN and WNLSF values shall not exceed 127.” This doesn’t say anything about the semantics when the previous/next deltas *don’t* differ (which they currently do not), but at any rate last week we were in the situation where the “true” untruncated WN and WNLSF values (2057 and 1929, respectively) differed by 128, and no re-interpretation (i.e., adding/subtracting 256 to account for the 8-bit truncation) would bring them any closer.

      If this is indeed the root cause (which seems pretty likely to me), then if the receivers are now “working” this week, I’m guessing that they might *still* incorrectly “think” (at least implicitly within their internal calculations) that the closest leap second is actually in the *future*, at the end of 2021-11-27, 256 weeks after the actual leap second on 2016-12-31, since that future untruncated WNLSF would be 127 weeks from “now.”

  2. The standupmaths channel did a video about the April 1024-week rollover when it happened: Why didn’t GPS crash? (Which is the only reason I was already aware of it.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.