[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Greg Ewing greg.ewing at canterbury.ac.nz
Thu Feb 16 00:54:56 CET 2006


Ron Adam wrote:

> I was presuming it would be done in C code and it will just need a 
> pointer to the first byte, memchr(), and then read n bytes directly into 
> a new memory range via  memcpy().

If the object supports the buffer interface, it can be
done that way. But if not, it would seem to make sense to
fall back on the iterator protocol.

> However, if it's done with a Python iterator and then each item is 
> translated to bytes in a sequence, (much slower), an encoding will need 
> to be known for it to work correctly.

No, it won't. When using the bytes(x) form, encoding has
nothing to do with it. It's purely a conversion from one
representation of an array of 0..255 to another.

When you *do* want to perform encoding, you use
bytes(u, encoding) and say what encoding you want
to use.

> Unfortunately Unicode strings 
> don't set an attribute to indicate it's own encoding.

I think you don't understand what an encoding is. Unicode
strings don't *have* an encoding, because theyre not encoded!
Encoding is what happens when you go from a unicode string
to something else.

> Since some longs will be of different length, yes a bytes(0L) could give 
> differing results on different platforms,

It's not just a matter of length. I'm not sure of the
details, but I believe longs are currently stored as an
array of 16-bit chunks, of which only 15 bits are used.
I'm having trouble imagining a use for low-level access
to that format, other than just treating it as an opaque
lump of data for turning back into a long later -- in
which case why not just leave it as a long in the first
place.

Greg



More information about the Python-Dev mailing list