
After more thought, I think using the struct-like typecharacters is not a good idea for the array protocol. I think that the character codes used by the numarray record array: kind_character + byte_width is better. Commas can separate heterogeneous data. The problem is that if the data buffer originally came from a different machine or saved with a different compiler (e.g. a mmap'ed file), then the struct-like typecodes only tell you the c-type that machine thought the data was. It does not tell you how to interpret the data on this machine. So, I think we should use the __array_typestr__ method to pass type information using the kind_character + byte_width method. I'm also going to use this type information for pickles, so that arrays pickled on one machine type will be able to be interpreted on another with ease.
Bool -- "b%d" % sizeof(bool) Signed Integer -- "i%d" % sizeof(<some int>) Unsigned Integer -- "u%d" % sizeof(<some uint>) Float -- "f%d" % sizeof(<some float>) Complex -- "c%d" % sizeof(<some complex>) Object -- "O%d" % sizeof(PyObject *) --- this would only be useful on shared memory String -- "S%d" % itemsize Unicode -- "U%d" % itemsize Void -- "V%d" % itemsize
Of course with this protocol for the typestr, the array_itemsize is redundant and can disappear. Another reason to like it.
I also think that rather than attach < or > to the start of the string it would be easier to have another protocol for endianness. Perhaps something like:
__array_endian__ (optional Python integer with the value 1 in it). If it is not 1, then a byteswap must be necessary.
I'm mixed on this, I could be persuaded either way. -Travis