[Python-Dev] bytes.from_hex()

Sat Feb 18 23:15:17 CET 2006

Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>>
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
>>
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
>>
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
> 
> Here's an idea I don't think I've seen before:
> 
> bytes.recode(b, src_encoding, dest_encoding)
> 
> This requires the user to state up-front what the source encoding is.
> One of the big problems that I see with the whole encoding mess is that
> so much of it contains implicit assumptions about the source encoding;
> this gets away from that.

Yes, but it's not just the encodings that are implicit, it is also the 
types.

    s.encode(codec)  # explicit source type, ? dest type
    s.decode(codec)  # explicit source type, ? dest type

    encodings.tostr(obj, codec) # implicit *known* source type
                                # explicit dest type

    encodings.tounicode(obj, codec) # implicit *known* source type
                                    # explicit dest type

In this case the source is implicit, but there can be a well defined 
check to validate the source type against the codec being used.  It's my 
feeling the user *knows* what he already has, and so it's more important 
that the resulting object type is explicit.

In your suggestion...

    bytes.recode(b, src_encoding, dest_incoding)

Here the encodings are both explicit, but the both the source and the 
destinations of the bytes are not.  Since it working on bytes, they 
could have come from anywhere, and after the translation they would then 
will be cast to the type the user *thinks* it should result in.  A 
source of errors that would likely pass silently.

The way I see it is the bytes type should be a lower level object that 
doesn't care what byte transformation it does. Ie.. they are all one way 
byte to byte transformations determined by context.  And it should have 
the capability to read from and write to types without translating in 
the same step.  Keep it simple.

Then it could be used as a lower level byte translator to implement 
encodings and other translations in encoding methods or functions 
instead of trying to make it replace the higher level functionality.

Cheers,
    Ron