
Nick Coghlan wrote:
All the unicode codecs, on the other hand, use encode to get from characters to bytes and decode to get from bytes to characters.
So if bytes objects *did* have an encode method, it should still result in a unicode object, just the same as a decode method does (because you are encoding bytes as characters), and unicode objects would acquire a corresponding decode method (that decodes from a character format such as base64 to the original byte sequence).
In the name of TOOWTDI, I'd suggest that we just eat the slight terminology glitch in the rare cases like base64, hex and oct (where the character format is technically the encoded format), and leave it so that there is a single method pair (bytes.decode to go from bytes to characters, and text.encode to go from characters to bytes).
I think you have it pretty straight here. While playing around with the example bytes class I noticed code reads much better when I use methods called tounicode and tostring. b64ustring = b.tounicode('base64') b = bytes(b64ustring, 'base64') The bytes could then *not* ignore the string decode codec but use it for string to string decoding. b64string = b.tostring('base64') b = bytes(b64string, 'base64') b = bytes(hexstring, 'hex') hexstring = b.tostring('hex') hexstring = b.tounicode('hex') An exception could be raised if the codec does not support input or output type depending on the situation. This would allow for differnt types of codecs to live together without as much confusion I think. I'm not suggesting we start using to-type everywhere, just where it might make things clearer over decode and encode. Expecting it not to fly, but just maybe it could? Ron