I encountered the following MATLAB code recently, simplified for this discussion; it builds a 3-by-4-by-2 array by assigning each of its three 4-by-2 “block” sub-arrays in turn to an initially empty array:
sz = [3, 4, 2]; block = reshape(0:(prod(sz(2:end)) - 1), sz(2:end)); a = ; for k = 1:sz(1) a(k,:,:) = block; block = block + numel(block); end
As this example shows, MATLAB allows resizing arrays on the fly, so to speak; assignment to an element– or in this case, a range of elements– whose indices would normally be out of bounds automatically re-allocates memory to accommodate the new, larger array. However, these incremental re-allocation and re-copying operations can be costly in terms of execution time. Granted, this toy example is small enough not to matter, but in the “real” example, the eventual array size was approximately
sz=[256,16384,4], in which case executing the above code takes about 8 seconds on my laptop.
Pre-allocation is usually recommended as a fix: that is, size the entire array in advance, before assigning any elements. But what if you don’t know the size of the array in advance? Although there are several approaches to dealing with this situation, with varying complexity, the subject of this post is to describe just how much execution time may be eliminated by only a modest change to the above code.
A major source of slowness is that the array expansion occurs along the first dimension… which in MATLAB, with its Fortran roots, is the index that changes most frequently in the underlying block of memory containing the array elements. So not only must we re-allocate memory to accommodate each new sub-array, even copying the existing array elements is a non-trivial task, as the following figure shows. As we “append” first the cyan block, then the red block, then the green block, each incremental re-allocation also requires “re-interleaving” the old and new elements in the new larger block of memory:
Instead, consider the following code, modified to append each block along the last— and most slowly changing– dimension, followed by the appropriate transposition via
permute to shuffle everything back into the originally intended shape:
a = ; for k = 1:sz(1) a(:,:,k) = block; block = block + numel(block); end a = permute(a, [3, 1, 2]);
The effect of this change is that, although we still incur all of the incremental re-allocation cost, each memory copy operation is a more straightforward and much faster “blit”:
This version executes in less than a quarter of a second on my laptop, about 35 times faster than the original 8 seconds.