[Python-Dev] bytes.from_hex()

Wed Mar 1 14:10:22 CET 2006

Bill Janssen wrote:
> Greg Ewing wrote:
>> Bill Janssen wrote:
>>
>>> bytes -> base64 -> text
>>> text -> de-base64 -> bytes
>> It's nice to hear I'm not out of step with
>> the entire world on this. :-)
> 
> Well, I can certainly understand the bytes->base64->bytes side of
> thing too.  The "text" produced is specified as using "a 65-character
> subset of US-ASCII", so that's really bytes.

If the base64 codec was a text<->bytes codec, and bytes did not have an encode 
method, then if you want to convert your original bytes to ascii bytes, you 
would do:

   ascii_bytes = orig_bytes.decode("base64").encode("ascii")

"Use base64 to convert my byte sequence to characters, then give me the 
corresponding ascii byte sequence"

To reverse the process:

   orig_bytes = ascii_bytes.decode("ascii").encode("base64")

"Use ascii to convert my byte sequence to characters, then use base64 to 
convert those characters back to the original byte sequence"

The only slightly odd aspect is that this inverts the conventional meaning of 
base64 encoding and decoding, where you expect to encode from bytes to 
characters and decode from characters to bytes.

As strings currently have both methods, the existing codec is able to use the 
conventional sense for base64: encode goes from "str-as-bytes" to 
"str-as-text" (giving a longer string with characters that fit in the base64 
subset) and decode goes from "str-as-text" to "str-as-bytes" (giving back the 
original string)

All the unicode codecs, on the other hand, use encode to get from characters 
to bytes and decode to get from bytes to characters.

So if bytes objects *did* have an encode method, it should still result in a 
unicode object, just the same as a decode method does (because you are 
encoding bytes as characters), and unicode objects would acquire a 
corresponding decode method (that decodes from a character format such as 
base64 to the original byte sequence).

In the name of TOOWTDI, I'd suggest that we just eat the slight terminology 
glitch in the rare cases like base64, hex and oct (where the character format 
is technically the encoded format), and leave it so that there is a single 
method pair (bytes.decode to go from bytes to characters, and text.encode to 
go from characters to bytes).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org