
This made me think of a serious performance limitation of structured dtypes: a structured dtype is always "packed", which may lead to terrible byte alignment for common types. For instance, `dtype([('a', 'u1'), ('b', 'u8')]).itemsize == 9`, meaning that the 8-byte integer is not aligned as an equivalent C-struct's would be, leading to all sorts of horrors at the cache and register level. Python's ctypes does the right thing here, and can be mined for ideas. For instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer is on the correct boundary:
class Aligned(ctypes.Structure): _fields_ = [('a', ctypes.c_uint8), ('b', ctypes.c_uint64)]
print ctypes.sizeof(Aligned()) # --> 16
I'd be surprised if someone hasn't already proposed fixing this, although perhaps this would be outside the scope of a GSOC project. I'm willing to wager that the performance improvements would be easily measureable.
I've been confronted to this very problem and ended up coding a "group class" which is a "split" structured array (each field is stored as a single array) offering the same interface as a regular structured array. http://www.loria.fr/~rougier/coding/software/numpy_group.py Nicolas