[Python-Dev] bytes.from_hex()

Ron Adam rrr at ronadam.com
Fri Mar 3 10:22:44 CET 2006


Greg Ewing wrote:
> Ron Adam wrote:
> 
>> This uses syntax to determine the direction of encoding.  It would be 
>> easier and clearer to just require two arguments or a tuple.
>>
>>       u = unicode(b, 'encode', 'base64')
>>       b = bytes(u, 'decode', 'base64')
> 
> The point of the exercise was to avoid using the terms
> 'encode' and 'decode' entirely, since some people claim
> to be confused by them.

Yes, that was what I was trying for with the tounicode, tostring 
(tobyte) suggestion, but the direction could become ambiguous as you 
pointed out.

The constructors above have 4 data items implied:
      1: The source object which includes the source type and data
      2: The codec to use
      3: The direction of the operation
      4: The destination type (determined by the constructor used)

There isn't any ambiguity other than when to use encode or decode, but 
in this case that really is a documentation problem because there is no 
ambiguities in this form.  Everything is explicit.

Another version of the above was pointed out to me off line that might 
be preferable.

   u = unicode(b, encode='base64')
   b = bytes(u, decode='base64')

Which would also work with the tostring and tounicode methods.

   u = b.tounicode(decode='base64')
   b = u.tobytes(incode='base64')


> If we're going to continue to use 'encode' and 'decode',
> why not just make them functions:
> 
>    b = encode(u, 'utf-8')
>    u = decode(b, 'utf-8')

 >>> import codecs
 >>> codecs.decode('abc', 'ascii')
u'abc'

There's that time machine again. ;-)

> In the case of Unicode encodings, if you get them
> backwards you'll get a type error.
> 
> The advantage of using functions over methods or
> constructor arguments is that they can be applied
> uniformly to any input and output types.

If codecs are to be more general, then there may be time when the 
returned type needs to be specified.  This would apply to codecs that 
could return either bytes or strings, or strings or unicode, or bytes or 
unicode.  Some inputs may equally work with more than one output type. 
Of course, the answer in these cases may be to just 'know' what you will 
get, and then convert it to what you want.

Cheers,
Ron




More information about the Python-Dev mailing list