Again, a relatively obvious tool for those who are more experienced with C than I, but this is something I recently stumbled across and found very useful.

One of my major annoyances with CUDA is the way that device emulation works — you go through your code, writing printf statements here and there, compile for device emulation and everything’s fine.  But remove your -deviceemu and everything goes horribly wrong, as device functions cannot call host functions.  Until now, my only way around the error has been to comment out all of my print statements, which is pretty arduous.

The answer lies with a variadic macro.  Define something like this at the top of your CUDA files, or the top of a generic header file included everywhere:

#ifdef DEVICEEMU
#define debug(format, ...) printf(format, ## __VA_ARGS__)
#else
#define debug(format, ...)
#endif

With this in place, where you would have used printf("Some output = %d.\n", variable), debug("Some output = %d.\n", variable) will do exactly the same thing.  If -D DEVICEEMU is passed as an argument to nvcc, all calls to debug will be replaced by suitable (and working) printf statements. If it isn’t, they are all replaced by empty lines and the compiler just skips over them.

A few quick changes to your Makefile and everything’s pretty much automatic.  Thanks, variadic macros!