[Python-Dev] bytes.from_hex()

Thu Feb 23 08:18:42 CET 2006

Stephen J. Turnbull wrote:
>>>>>> "Ron" == Ron Adam <rrr at ronadam.com> writes:
> 
>     Ron> Terry Reedy wrote:
> 
>     >> I prefer the shorter names and using recode, for instance, for
>     >> bytes to bytes.
> 
>     Ron> While I prefer constructors with an explicit encode argument,
>     Ron> and use a recode() method for 'like to like' coding. 
> 
> 'Recode' is a great name for the conceptual process, but the methods
> are directional.  Also, in internationalization work, "recode"
> strongly connotes "encodingA -> original -> encodingB", as in iconv.

We could call it transform or translate if needed.  Words are reused 
constantly in languages, so I don't think it's a sticking point.  As 
long as its meaning is documented well and doesn't change later, I think 
it would be just fine.  If the concept of not having encode and decode 
as methods work, (and has support other than me) the name can be decided 
later.

> I do prefer constructors, as it's generally not a good idea to do
> encoding/decoding in-place for human-readable text, since the codecs
> are often lossy.
> 
>     Ron> Then the whole encode/decode confusion goes away.
> 
> Unlikely.  Errors like "A string".encode("base64").encode("base64")
> are all too easy to commit in practice.

Yes,... and wouldn't the above just result in a copy so it wouldn't be 
an out right error.  But I understand that you mean similar cases where 
it would change the bytes with consecutive calls.  In any case, I was 
referring to the confusion with the method names and how they are used.

This is how I was thinking of it.

    * Given that the string type gains a __codec__ attribute to handle 
automatic decoding when needed.   (is there a reason not to?)

       str(object[,codec][,error]) -> string coded with codec

       unicode(object[,error]) -> unicode

       bytes(object) -> bytes

     * a recode() method is used for transformations that *do_not* 
change the current codec.

See any problems with it?  (Other than from gross misuse of course and 
your dislike of 'recode' as the name.)

There may still be a __decode__() method on strings to do the actual 
decoding, but it wouldn't be part of the public interface.  Or it could 
call a function from the codec to do it.

     return self.codec.decode(self)

The only catching point I see is if having an additional attribute on 
strings would increase the memory which many small strings would use. 
That may be why it wasn't done this way to start.  (?)

Cheers,
    Ron