Sunday, December 30, 2012

Multidimensional arrays in C

I'm writing this mostly out of frustration with bad code I keep seeing.

Assume you want to allocate a multidimensional array of integers, rows R, columns C, for a total of N=R×C elements. There are a number of ways of doing this, with different pros and cons, and some pretty serious pitfalls to stumble into.

Stack Array

int array[R][C];

The naive approach, but also the fastest. If you can get away with this, it's what you want to use, with the caveats that N must be small enough not to use up all the stack space, and that R and C need to be known at compile time.

Array of Pointers (a.k.a. The Wrong Way)

int** array = calloc(R, sizeof(*array));
for(int r = 0; r < C; r++) {
  array[r] = calloc(C, sizeof(**array));
}

Unless you want a staggered array (which you almost never want), DON'T DO THIS. I can't emphasize how bad this is. It needlessly fragments the heap. It's prone to memory leaks. It's slow to allocate, with √ N malloc() operations. It's slow to access, with data scattered across the heap -- a cache nightmare.

Manual Indexing of Dynamically Allocated Single-Dimensional Array

int* array = calloc(R*C, sizeof(*array));

Perhaps the most versatile method. For a simple array, access is done with index c + C*r, but other indexing functions can be designed for staggered arrays of a previously known shape.

Pointer-To-Array

int (*array)[R][C] = malloc(sizeof(*array));

This is perhaps the least common method, accessed with (*array)[r][c]. The method shares stack arrays' requirement for compile-time known dimensions, but doesn't reside in the stack, so size is irrelevant. Indexing is done automatically.

Here's me talking out of my ass. Kept it for discussion purposes.

The reason you'll want to know about this is when you pass stack arrays as function arguments. Consider this function prototype.

void foo(int arg[R][C])

Are you furiously screaming at the screen? You should be. A function prototyped like that will copy its array argument into its stack frame upon invocation. The argument, to be extra clear is not a pointer to an array, but an actual array.

We clearly don't want that, as even if your program mysteriously doesn't die from running out of stack space, it will run orders of magnitude slower than it should.

The correct syntax for passing a stack array by reference is

void foo(int (*arg)[R][C]);

int arry[R][C];
foo(&arry);

3 comments:

  1. I'm pretty sure void foo(int arg[R][C]); isn't going to copy the entire array... it will be the same as any other function with an array argument, it'll degrade to a pointer to the first element, ie the same as void foo(int arg[][C]); or void foo(int (*arg)[C]);.

    Notably R doesn't need to be known at compile time for this type to work, though C does... but if R can change you'd need to pass it in as another parameter, same as any other variable-length array.

    (Perhaps you're getting mixed up with passing structs as parameters to functions, where they *are* copied to the stack...)

    On another note, one option for these I've seen is to mix-and-match the "array of pointers" method and the "single-dimensional array" method:
    int *buffer = calloc(R*C, sizeof(*buffer));
    int **array = calloc(R, sizeof(*array));
    for (int i = 0; i < R; i++)
    array[i] = buffer + (i * C);
    This has the simplicity-of-use of the array-of-pointers method, with the non-memory-fragmentingness of the single-dimensional method. Plus you only have two things to remember to free, instead of R+1. Though, if someone is using this I'll usually recommend they just bite the bullet and just do the single-dimensional method directly and save themselves some messing around.

    ReplyDelete
  2. Huh. You're right. Didn't have a standard to refer to, and didn't have a C compiler around to verify what it did. So I relied on a google search to verify I was correct. ... In retrospect, a silly thing to rely on.

    The array of pointer method with a contiguous chunk is a clear improvement, but it still has enough drawbacks to be worse than manual indexing. You're memoizing a simple arithmetic function that at best is faster than the time it takes to retrieve it, and at worst retrieving rather than calculating it on the spot it will prevent vectorization of your code.

    But for non-rectangular matrices of non-trivial geometry it's probably a decent alternative.

    ReplyDelete
  3. Appreciation for nice Updates, I found something new and folks can get useful info about BEST ONLINE TRAINING

    ReplyDelete