[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Ron Adam rrr at ronadam.com
Wed Feb 15 11:24:38 CET 2006


Greg Ewing wrote:
> Ron Adam wrote:
> 
>> My first impression and thoughts were:  (and seems incorrect now)
>>
>>      bytes(object) ->  byte sequence of objects value
>>
>> Basically a "memory dump" of objects value.
> 
> As I understand the current intentions, this is correct.
> The bytes constructor would have two different signatures:
> 
>     (1)   bytes(seq) --> interprets seq as a sequence of
>                          integers in the range 0..255,
>                          exception otherwise
> 
>     (2a)  bytes(str, encoding)     --> encodes the characters of
>     (2b)  bytes(unicode, encoding)     the string using the specified
>                                        encoding
> 
> In (2a) the string would be interpreted as containing
> ascii characters, with an exception otherwise. In 3.0,
> (2a) will disappear leaving only (1) and (2b).

I was presuming it would be done in C code and it will just need a 
pointer to the first byte, memchr(), and then read n bytes directly into 
a new memory range via  memcpy(). But I don't know if that's possible 
with Pythons object model.  (My C skills are a bit rusty as well)

However, if it's done with a Python iterator and then each item is 
translated to bytes in a sequence, (much slower), an encoding will need 
to be known for it to work correctly.  Unfortunately Unicode strings 
don't set an attribute to indicate it's own encoding. So bytes() can't 
just do encoding = s.encoding to find out, it would need to be specified 
in this case.

And that should give you a byte object that is equivalent to the bytes 
in memory, providing Python doesn't compress data internally to save 
space. (?, I don't think it does)

I'd prefer the first version *if possible* because of the performance.

>> And I was thinking a bytes argument of more than one item would indicate 
>> a byte sequence.
>>
>>      bytes(1,2,3)  ->  bytes([1,2,3])
> 
> But then you have to test the argument in the one-argument
> case and try to guess whether it should be interpreted as
> a sequence or an integer. Best to avoid having to do that.

Yes, I agree.

>> Which is fine... so ???
>>
>>     b = bytes(0L) ->  bytes([0,0,0,0])
> 
> No, bytes(0L) --> TypeError because 0L doesn't implement
> the iterator protocol or the buffer interface.

It wouldn't need it if it was a direct C memory copy.

> I suppose long integers might be enhanced to support the
> buffer interface in 3.0, but that doesn't seem like a good
> idea, because the bytes you got that way would depend on
> the internal representation of long integers. In particular,

Since some longs will be of different length, yes a bytes(0L) could give 
differing results on different platforms, but it will always give the 
same result on the platform it is run on. I actually think this is a 
plus and not a problem. If you are using Python to implement a byte 
interface you need to *know* it is different, not have it hidden.

     bytesize = len(bytes(0L))  # find how long a long is


Cheers,
   Ronald Adam




More information about the Python-Dev mailing list