[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Ron Adam
rrr at ronadam.com
Wed Feb 15 04:45:26 CET 2006
Greg Ewing wrote:
> Guido van Rossum wrote:
>> On 2/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>
>>> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>>>
>>>> On 2/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>>>
>>>> What would bytes("abc\xf0", "latin-1") *mean*?
>>> I'm saying that XXX would be the same encoding as you specified. i.e.,
>>> including an encoding means you are encoding the *meaning* of the string.
>
> No, this is wrong. As I understand it, the encoding
> argument to bytes() is meant to specify how to *encode*
> characters into the bytes object. If you want to be able
> to specify how to *decode* a str argument as well, you'd
> need a third argument.
I'm not sure I understand why this would be needed? But maybe it's
still too early to pin anything down.
My first impression and thoughts were: (and seems incorrect now)
bytes(object) -> byte sequence of objects value
Basically a "memory dump" of objects value. And so...
object(bytes) -> copy of original object
This would reproduce a copy of the original object as long as the from
and to object are the same type with no encoding needed. If they are
different then you would get garbage, or an error. But that would be a
programming error and not a language issue. It would be up to the
programmer to not do that.
Of course this is one of those easier to say than do concepts I'm sure.
And I was thinking a bytes argument of more than one item would indicate
a byte sequence.
bytes(1,2,3) -> bytes([1,2,3])
Where any values above 255 would give an error, but it seems an
explicit list is preferred. And that's fine because it creates a way
for bytes to know how to handle everything else. (I think)
bytes([1,2,3]] -> bytes[(1,2,3)]
Which is fine... so ???
b = bytes(0L) -> bytes([0,0,0,0])
long(b) -> 0L convert it back to 0L
And ...
b = bytes([0L]) -> bytes([0]) # a single byte
int(b) -> 0 convert it back to 0
long(b) -> 0L
It's up to the programmer to know if it's safe. Working with raw data is
always a programmer needs to be aware of what's going on thing.
But would it be any different with strings? You wouldn't ever want to
encode one type's bytes into a different type directly. It would be
better to just encode it back to the original type, then use *it's*
encoding method to change it.
so...
b = bytes(s) -> bytes( raw sequence of bytes )
Weather or not you get a single byte per char or multiple bytes per
character would depend on the strings encoding.
s = str(bytes, encoding) -> original string
You need to specify it here, because there is more than one sting
encoding. To avoid encodings entirely we would need a type for each
encoding. (which isn't really avoiding anything) And it's the "raw data
so programmer needs to be aware" situation again. Don't decode to
something other than what it is.
If someone needs automatic encoding/decoding, then they probably should
write a class to do what they want. Something roughly like...
class bytekeeper(object):
b = None
t = None
e = None
def __init__(self, obj, enc='bytes') # or whatever encoding
self.e = enc
self.t = type(obj)
self.b = bytes(obj)
def decode(self):
...
Would we be able to subclass bytes?
class bytekeeper(bytes): ?
...
Ok.. enough rambling... I wonder how much of this is way out in left
field. ;)
cheers,
Ronald Adam
And as fa
In this case the encoding argument would only be needed not to
More information about the Python-Dev
mailing list