[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Ron Adam
rrr at ronadam.com
Wed Feb 15 11:24:38 CET 2006
Greg Ewing wrote:
> Ron Adam wrote:
>
>> My first impression and thoughts were: (and seems incorrect now)
>>
>> bytes(object) -> byte sequence of objects value
>>
>> Basically a "memory dump" of objects value.
>
> As I understand the current intentions, this is correct.
> The bytes constructor would have two different signatures:
>
> (1) bytes(seq) --> interprets seq as a sequence of
> integers in the range 0..255,
> exception otherwise
>
> (2a) bytes(str, encoding) --> encodes the characters of
> (2b) bytes(unicode, encoding) the string using the specified
> encoding
>
> In (2a) the string would be interpreted as containing
> ascii characters, with an exception otherwise. In 3.0,
> (2a) will disappear leaving only (1) and (2b).
I was presuming it would be done in C code and it will just need a
pointer to the first byte, memchr(), and then read n bytes directly into
a new memory range via memcpy(). But I don't know if that's possible
with Pythons object model. (My C skills are a bit rusty as well)
However, if it's done with a Python iterator and then each item is
translated to bytes in a sequence, (much slower), an encoding will need
to be known for it to work correctly. Unfortunately Unicode strings
don't set an attribute to indicate it's own encoding. So bytes() can't
just do encoding = s.encoding to find out, it would need to be specified
in this case.
And that should give you a byte object that is equivalent to the bytes
in memory, providing Python doesn't compress data internally to save
space. (?, I don't think it does)
I'd prefer the first version *if possible* because of the performance.
>> And I was thinking a bytes argument of more than one item would indicate
>> a byte sequence.
>>
>> bytes(1,2,3) -> bytes([1,2,3])
>
> But then you have to test the argument in the one-argument
> case and try to guess whether it should be interpreted as
> a sequence or an integer. Best to avoid having to do that.
Yes, I agree.
>> Which is fine... so ???
>>
>> b = bytes(0L) -> bytes([0,0,0,0])
>
> No, bytes(0L) --> TypeError because 0L doesn't implement
> the iterator protocol or the buffer interface.
It wouldn't need it if it was a direct C memory copy.
> I suppose long integers might be enhanced to support the
> buffer interface in 3.0, but that doesn't seem like a good
> idea, because the bytes you got that way would depend on
> the internal representation of long integers. In particular,
Since some longs will be of different length, yes a bytes(0L) could give
differing results on different platforms, but it will always give the
same result on the platform it is run on. I actually think this is a
plus and not a problem. If you are using Python to implement a byte
interface you need to *know* it is different, not have it hidden.
bytesize = len(bytes(0L)) # find how long a long is
Cheers,
Ronald Adam
More information about the Python-Dev
mailing list