[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Wed Feb 15 04:45:26 CET 2006

Greg Ewing wrote:
> Guido van Rossum wrote:
>> On 2/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>
>>> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>>>
>>>> On 2/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>>>>
>>>> What would bytes("abc\xf0", "latin-1") *mean*? 
>>> I'm saying that XXX would be the same encoding as you specified.  i.e.,
>>> including an encoding means you are encoding the *meaning* of the string.
> 
> No, this is wrong. As I understand it, the encoding
> argument to bytes() is meant to specify how to *encode*
> characters into the bytes object. If you want to be able
> to specify how to *decode* a str argument as well, you'd
> need a third argument.

I'm not sure I understand why this would be needed?  But maybe it's 
still too early to pin anything down.

My first impression and thoughts were:  (and seems incorrect now)

     bytes(object) ->  byte sequence of objects value

Basically a "memory dump" of objects value.  And so...

     object(bytes) ->  copy of original object

This would reproduce a copy of the original object as long as the from 
and to object are the same type with no encoding needed.  If they are 
different then you would get garbage, or an error. But that would be a 
programming error and not a language issue. It would be up to the 
programmer to not do that.

Of course this is one of those easier to say than do concepts I'm sure.

And I was thinking a bytes argument of more than one item would indicate 
a byte sequence.

     bytes(1,2,3)  ->  bytes([1,2,3])

Where any values above 255 would give an error,  but it seems an 
explicit list is preferred.  And that's fine because it creates a way 
for bytes to know how to handle everything else. (I think)

    bytes([1,2,3]]  -> bytes[(1,2,3)]

Which is fine... so ???

    b = bytes(0L) ->  bytes([0,0,0,0])

    long(b) ->  0L    convert it back to 0L

And ...

    b = bytes([0L])  ->  bytes([0])  # a single byte

    int(b) ->  0    convert it back to 0
    long(b) ->  0L

It's up to the programmer to know if it's safe. Working with raw data is 
always a programmer needs to be aware of what's going on thing.

But would it be any different with strings?  You wouldn't ever want to 
encode one type's bytes into a different type directly. It would be 
better to just encode it back to the original type, then use *it's* 
encoding method to change it.

so...

   b = bytes(s)  ->  bytes( raw sequence of bytes )

Weather or not you get a single byte per char or multiple bytes per 
character would depend on the strings encoding.

   s = str(bytes, encoding)  ->  original string

You need to specify it here, because there is more than one sting 
encoding. To avoid encodings entirely we would need a type for each 
encoding. (which isn't really avoiding anything) And it's the "raw data 
so programmer needs to be aware" situation again. Don't decode to 
something other than what it is.

If someone needs automatic encoding/decoding, then they probably should 
write a class to do what they want.  Something roughly like...

   class bytekeeper(object):
      b = None
      t = None
      e = None
      def __init__(self, obj, enc='bytes')   # or whatever encoding
         self.e = enc
         self.t = type(obj)
         self.b = bytes(obj)
      def decode(self):
         ...

Would we be able to subclass bytes?

     class bytekeeper(bytes):   ?
        ...

Ok.. enough rambling... I wonder how much of this is way out in left 
field.  ;)

cheers,
  Ronald Adam

And as fa

In this case the encoding argument would only be needed not to