[Numpy-discussion] Re: Bytes Object and Metadata

Wed Mar 30 11:49:03 EST 2005

>
> After more thought,  I think using the struct-like typecharacters is 
> not a good idea for the array protocol.    I think that the character 
> codes used by the numarray record array:  kind_character + byte_width 
> is better.  Commas can separate heterogeneous data.    The problem is 
> that if the data buffer originally came from a different machine or 
> saved with a different compiler (e.g. a mmap'ed file), then the 
> struct-like typecodes only tell you the c-type that machine thought 
> the data was.  It does not tell you how to interpret the data on this 
> machine.
> So,  I think we should use the __array_typestr__ method to pass type 
> information using the kind_character + byte_width method.  I'm also 
> going to use this type information for pickles, so that arrays pickled 
> on one machine type will be able to be interpreted on another with ease.
>
> Bool                      -- "b%d" % sizeof(bool)
> Signed Integer     -- "i%d" % sizeof(<some int>)
> Unsigned Integer -- "u%d" % sizeof(<some uint>)
> Float                      -- "f%d" % sizeof(<some float>)
> Complex                --  "c%d" % sizeof(<some complex>)
> Object                   --  "O%d" % sizeof(PyObject *)      --- this 
> would only be useful on shared memory
> String                    --  "S%d"  % itemsize
> Unicode                --   "U%d" % itemsize
> Void                      --    "V%d" % itemsize  

Of course with this protocol for the typestr, the array_itemsize is 
redundant and can disappear.  Another reason to like it.

> I also think that rather than attach < or > to the start of the string 
> it would be easier to have another protocol for endianness.  Perhaps 
> something like:

> __array_endian__  (optional Python integer with the value 1 in it).  
> If it is not 1, then a byteswap must be necessary.

I'm mixed on this, I could be persuaded either way.

-Travis