Sunday, December 30, 2012

Multidimensional arrays in C

I'm writing this mostly out of frustration with bad code I keep seeing.

Assume you want to allocate a multidimensional array of integers, rows R, columns C, for a total of N=R×C elements. There are a number of ways of doing this, with different pros and cons, and some pretty serious pitfalls to stumble into.

Stack Array

int array[R][C];

The naive approach, but also the fastest. If you can get away with this, it's what you want to use, with the caveats that N must be small enough not to use up all the stack space, and that R and C need to be known at compile time.

Array of Pointers (a.k.a. The Wrong Way)

int** array = calloc(R, sizeof(*array));
for(int r = 0; r < C; r++) {
  array[r] = calloc(C, sizeof(**array));
}

Unless you want a staggered array (which you almost never want), DON'T DO THIS. I can't emphasize how bad this is. It needlessly fragments the heap. It's prone to memory leaks. It's slow to allocate, with √ N malloc() operations. It's slow to access, with data scattered across the heap -- a cache nightmare.

Manual Indexing of Dynamically Allocated Single-Dimensional Array

int* array = calloc(R*C, sizeof(*array));

Perhaps the most versatile method. For a simple array, access is done with index c + C*r, but other indexing functions can be designed for staggered arrays of a previously known shape.

Pointer-To-Array

int (*array)[R][C] = malloc(sizeof(*array));

This is perhaps the least common method, accessed with (*array)[r][c]. The method shares stack arrays' requirement for compile-time known dimensions, but doesn't reside in the stack, so size is irrelevant. Indexing is done automatically.

Here's me talking out of my ass. Kept it for discussion purposes.

The reason you'll want to know about this is when you pass stack arrays as function arguments. Consider this function prototype.

void foo(int arg[R][C])

Are you furiously screaming at the screen? You should be. A function prototyped like that will copy its array argument into its stack frame upon invocation. The argument, to be extra clear is not a pointer to an array, but an actual array.

We clearly don't want that, as even if your program mysteriously doesn't die from running out of stack space, it will run orders of magnitude slower than it should.

The correct syntax for passing a stack array by reference is

void foo(int (*arg)[R][C]);

int arry[R][C];
foo(&arry);