Last week a co-worker observed confusing behavior from the uniform random number generator in Simulink. The output of the generator “did not seem random.” This is what she saw:
This is certainly uniform, but I would agree that this does not appear to be “random.” (Note: there is a problem with the setup here, though, which we will get to shortly. Can you see it?) I wanted to understand this behavior better, and in the process I found some interesting things about the algorithm used in the generator.
I don’t use Simulink. But I do use MATLAB, which supports– and documents— several different algorithms for generating random numbers. This is why I was surprised to find that the corresponding documentation on Simulink’s uniform random number generator provides no detail, let alone configurable options, regarding the algorithm used. [Edit: Looking at this again, the documentation has now been updated to indicate the algorithm being used, described below.]
Fortunately, we can reverse-engineer the algorithm without too much detective work. First, the regularity observed in the figure above strongly suggests a linear congruential generator, and even more fortunately, MATLAB provides such a generator that, after some experimenting, exactly reproduces the behavior in Simulink. That generator is the ‘mcg16807’ or ‘v4’ legacy generator that was the default through MATLAB version 4.0. In other words, the latest version of Simulink today provides only a single generator… that MATLAB left behind over 15 years ago. Huh.
The ‘mcg16807’ generator is essentially the MINSTD generator, in which the state of the generator traverses the elements of the (cyclic) multiplicative group of integers modulo a prime:
where in this case and . Integer values of the state in the range are transformed to random real values output from the generator in the unit open interval according to
So far, none of this is new or interesting. The interesting wrinkle is in how the generator is seeded. In all implementations I have seen, the seed is simply mapped identically to the initial state of the generator. But in the MATLAB/Simulink implementation, seeding appears to be slightly more complicated.
How can we tell? Because if we were to simply set the generator’s initial state equal to the seed, as is usually the case, then the figure above would have looked even worse:
These 100 “random” values simply increase linearly in really small steps of , or about 0.0000000005. This vividly highlights the potential danger, alluded to earlier, of using this approach to seeding to control random number generation in, say, multiple iterations of a Monte Carlo simulation. (And yet, I frequently see analysis that involves exactly this practice, just with different generators. Food for thought: how do we get away with it? Or are we getting away with it?)
Getting back to the problem of determining the more complex seeding algorithm in the Simulink generator: we have a “gray box” random number generator that, when seeded with a value , yields a first random output , where
By solving this congruence for a few appropriately selected seeds, we get a collection of pairs; the problem is to describe the function . After some trial and error, it turns out that this function is reasonably simple… but with a strange twist. The basic idea is that a given 32-bit integer seed is transformed into a 31-bit state by exchanging the high and low-order 16-bit “halves” of the seed… with a few necessary special cases to ensure that the resulting state is in the valid range .
Following is MATLAB code that reproduces the behavior of the Simulink generator:
classdef MCG16807 < handle properties a = 7^5; % 16807 m = 2^31 - 1; state; end methods function self = MCG16807(s) self.seed(s); end function  = seed(self, s) switch s case 0 self.state = 1144108930; case 2^32 - 1 self.state = self.m - 1; otherwise self.state = min(rem(s * 2^16 + floor(s / 2^16) + ... bitand(s, 2^15), 2^31), self.m - 1); end end function r = rand(self) self.state = rem(self.a * self.state, self.m); r = self.state / self.m; end end end
I suppose swapping word halves like this is understandable, since it is a simple way to mitigate somewhat the undesirable effect above, where often-used “nearby” seeds yield very “nearby” sub-sequences of random draws. But in an attempt to be helpful, I think it ends up merely hiding the still-present potential danger inherent in this mis-use of a sub-standard generator.
The “twist,” though, is that weird extra addition of bitand(s, 2^15). That is, to compute the state from the seed, we do not just swap word halves, we also add (arithmetically, not bit-wise) the masked 15th bit of the seed. But why?
The reasoning behind this oddly-specific-yet-seemingly-arbitrary extra step is not clear to me. My guess, though– which is admittedly cynically biased– is that it has the feel of a rather half-hearted attempt to further “mix” the transformation from seed to state… in a way that rarely has a useful effect.