[Python-Dev] bytes.from_hex()

Sat Feb 18 10:16:07 CET 2006

Ron Adam <rrr at ronadam.com> wrote:
> Josiah Carlson wrote:
> > Bengt Richter had a good idea with bytes.recode() for strictly bytes
> > transformations (and the equivalent for text), though it is ambiguous as
> > to the direction; are we encoding or decoding with bytes.recode()?  In
> > my opinion, this is why .encode() and .decode() makes sense to keep on
> > both bytes and text, the direction is unambiguous, and if one has even a
> > remote idea of what the heck the codec is, they know their result.
> > 
> >  - Josiah
> 
> I like the bytes.recode() idea a lot. +1
> 
> It seems to me it's a far more useful idea than encoding and decoding by 
> overloading and could do both and more.  It has a lot of potential to be 
> an intermediate step for encoding as well as being used for many other 
> translations to byte data.

Indeed it does.

> I think I would prefer that encode and decode be just functions with 
> well defined names and arguments instead of being methods or arguments 
> to string and Unicode types.

Attaching it to string and unicode objects is a useful convenience. 
Just like x.replace(y, z) is a convenience for string.replace(x, y, z) . 
Tossing the encode/decode somewhere else, like encodings, or even string,
I see as a backwards step.

> I'm not sure on exactly how this would work. Maybe it would need two 
> sets of encodings, ie.. decoders, and encoders.  An exception would be
> given if it wasn't found for the direction one was going in.
> 
> Roughly... something or other like:
> 
>      import encodings
> 
>      encodings.tostr(obj, encoding):
>         if encoding not in encoders:
>             raise LookupError 'encoding not found in encoders'
>         # check if obj works with encoding to string
>         # ...
>         b = bytes(obj).recode(encoding)
>         return str(b)
> 
>      encodings.tounicode(obj, decodeing):
>         if decoding not in decoders:
>             raise LookupError 'decoding not found in decoders'
>         # check if obj works with decoding to unicode
>         # ...
>         b = bytes(obj).recode(decoding)
>         return unicode(b)
> 
> Anyway... food for thought.

Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something? 
Are we going to need to embed the direction in the encoding/decoding
name (to_base64, from_base64, etc.)?  That doesn't any better than
binascii.b2a_base64 .  What about .reencode and .redecode?  It seems as
though the 're' added as a prefix to .encode and .decode makes it
clearer that you get the same type back as you put in, and it is also
unambiguous to direction.

The question remains: is str.decode() returning a string or unicode
depending on the argument passed, when the argument quite literally
names the codec involved, difficult to understand?  I don't believe so;
am I the only one?

 - Josiah