[Numpy-discussion] Bytes vs. Unicode in Python3

Dag Sverre Seljebotn dagss at student.matnat.uio.no
Thu Dec 3 08:03:13 EST 2009


Pauli Virtanen wrote:
> Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote:
> [clip]
>   
>> One thing to keep in mind here is that PEP 3118 actually defines a
>> standard dtype format string, which is (mostly) incompatible with
>> NumPy's. It should probably be supported as well when PEP 3118 is
>> implemented.
>>     
>
> PEP 3118 is for the most part implemented in my Py3K branch now -- it was 
> not actually much work, as I could steal most of the format string 
> converter from numpy.pxd.
>   
Great! Are you storing the format string in the dtype types as well? (So 
that no release is needed and acquisitions are cheap...)

As far as numpy.pxd goes -- well, for the simplest dtypes.
> Some questions:
>
> How hard do we want to try supplying a buffer? Eg. if the consumer does 
> not specify strided but specifies suboffsets, should we try to compute 
> suitable suboffsets? Should we try making contiguous copies of the data 
> (I guess this would break buffer semantics?)?
>   
Actually per the PEP, suboffsets imply strided:

#define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES)

:-) So there's no real way for a consumer to specify only suboffsets, 
0x0100 is not a possible flag I think. Suboffsets can't really work 
without the strides anyway IIUC, and in the case of NumPy the field can 
always be left at 0.

IMO one should very much stay clear of making contiguous copies, 
especially considering the existance of PyBuffer_ToContiguous, which 
makes it trivial for client code to get a pointer to a contiguous buffer 
anyway. The intention of the PEP seems to be to export the buffer in as 
raw form as possible.

Do keep in mind that IS_C_CONTIGUOUS and IS_F_CONTIGUOUS go be too 
conservative with NumPy arrays. If a contiguous buffer is requested, 
then  looping through the strides and checking that the strides are 
monotonically decreasing/increasing could eventually save copying in 
some cases. I think that could be worth it -- I actually have my own 
code for IS_F_CONTIGUOUS rather than relying on the flags personally 
because of this issue, so it does come up in practice.

Dag Sverre



More information about the NumPy-Discussion mailing list