[Python-Dev] bytes.from_hex()

Sat Feb 18 09:35:24 CET 2006

Josiah Carlson wrote:
> Bob Ippolito <bob at redivi.com> wrote:
>>
>> On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
>>
>>> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>>>> Stephen J. Turnbull wrote:
>>>>>>>>>> "Guido" == Guido van Rossum <guido at python.org> writes:
>>>>>     Guido> - b = bytes(t, enc); t = text(b, enc)
>>>>>
>>>>> +1  The coding conversion operation has always felt like a  
>>>>> constructor
>>>>> to me, and in this particular usage that's exactly what it is.  I
>>>>> prefer the nomenclature to reflect that.
>>>> This also has the advantage that it competely
>>>> avoids using the verbs "encode" and "decode"
>>>> and the attendant confusion about which direction
>>>> they go in.
>>>>
>>>> e.g.
>>>>
>>>>    s = text(b, "base64")
>>>>
>>>> makes it obvious that you're going from the
>>>> binary side to the text side of the base64
>>>> conversion.
>>> But you aren't always getting *unicode* text from the decoding of  
>>> bytes,
>>> and you may be encoding bytes *to* bytes:
>>>
>>>     b2 = bytes(b, "base64")
>>>     b3 = bytes(b2, "base64")
>>>
>>> Which direction are we going again?
>> This is *exactly* why the current set of codecs are INSANE.   
>> unicode.encode and str.decode should be used *only* for unicode  
>> codecs.  Byte transforms are entirely different semantically and  
>> should be some other method pair.
> 
> The problem is that we are overloading data types.  Strings (and bytes)
> can contain both encoded text as well as data, or even encoded data.

Right

> Educate the users.  Raise better exceptions telling people why their
> encoding or decoding failed, as Ian Bicking already pointed out.  If
> bytes.encode() and the equivalent of text.decode() is going to disappear,

+1 on better documentation all around with regards to encodings and 
Unicode.  So far the best explanation I've found (so far) is in PEP 100. 
  The Python docs and built in help hardly explain more than the minimal 
argument list for the encoding and decoding methods, and the str and 
unicode type constructor arguments aren't explained any better.

> Bengt Richter had a good idea with bytes.recode() for strictly bytes
> transformations (and the equivalent for text), though it is ambiguous as
> to the direction; are we encoding or decoding with bytes.recode()?  In
> my opinion, this is why .encode() and .decode() makes sense to keep on
> both bytes and text, the direction is unambiguous, and if one has even a
> remote idea of what the heck the codec is, they know their result.
> 
>  - Josiah

I like the bytes.recode() idea a lot. +1

It seems to me it's a far more useful idea than encoding and decoding by 
overloading and could do both and more.  It has a lot of potential to be 
an intermediate step for encoding as well as being used for many other 
translations to byte data.

I think I would prefer that encode and decode be just functions with 
well defined names and arguments instead of being methods or arguments 
to string and Unicode types.

I'm not sure on exactly how this would work. Maybe it would need two 
sets of encodings, ie.. decoders, and encoders.  An exception would be
given if it wasn't found for the direction one was going in.

Roughly... something or other like:

     import encodings

     encodings.tostr(obj, encoding):
        if encoding not in encoders:
            raise LookupError 'encoding not found in encoders'
        # check if obj works with encoding to string
        # ...
        b = bytes(obj).recode(encoding)
        return str(b)

     encodings.tounicode(obj, decodeing):
        if decoding not in decoders:
            raise LookupError 'decoding not found in decoders'
        # check if obj works with decoding to unicode
        # ...
        b = bytes(obj).recode(decoding)
        return unicode(b)

Anyway... food for thought.

Cheers,
    Ronald Adam