Re: [Numpy-discussion] Re: Bytes Object and Metadata

March 30, 2005


      ...
After more thought,  I think using the struct-like typecharacters is 
not a good idea for the array protocol.    I think that the character 
codes used by the numarray record array:  kind_character + byte_width 
is better.  Commas can separate heterogeneous data.    The problem is 
that if the data buffer originally came from a different machine or 
saved with a different compiler (e.g. a mmap'ed file), then the 
struct-like typecodes only tell you the c-type that machine thought 
the data was.  It does not tell you how to interpret the data on this 
machine.
So,  I think we should use the __array_typestr__ method to pass type 
information using the kind_character + byte_width method.  I'm also 
going to use this type information for pickles, so that arrays pickled 
on one machine type will be able to be interpreted on another with ease.
Bool                      -- "b%d" % sizeof(bool)
Signed Integer     -- "i%d" % sizeof(<some int>)
Unsigned Integer -- "u%d" % sizeof(<some uint>)
Float                      -- "f%d" % sizeof(<some float>)
Complex                --  "c%d" % sizeof(<some complex>)
Object                   --  "O%d" % sizeof(PyObject *)      --- this 
would only be useful on shared memory
String                    --  "S%d"  % itemsize
Unicode                --   "U%d" % itemsize
Void                      --    "V%d" % itemsize
Of course with this protocol for the typestr, the array_itemsize is 
redundant and can disappear.  Another reason to like it.
...
I also think that rather than attach < or > to the start of the string 
it would be easier to have another protocol for endianness.  Perhaps 
something like:
...
__array_endian__  (optional Python integer with the value 1 in it).  
If it is not 1, then a byteswap must be necessary.
I'm mixed on this, I could be persuaded either way.

-Travis

Re: [Numpy-discussion] Re: Bytes Object and Metadata

Travis Oliphant