rrr at ronadam.com
Wed Mar 1 15:49:09 CET 2006
Nick Coghlan wrote:
> All the unicode codecs, on the other hand, use encode to get from characters
> to bytes and decode to get from bytes to characters.
> So if bytes objects *did* have an encode method, it should still result in a
> unicode object, just the same as a decode method does (because you are
> encoding bytes as characters), and unicode objects would acquire a
> corresponding decode method (that decodes from a character format such as
> base64 to the original byte sequence).
> In the name of TOOWTDI, I'd suggest that we just eat the slight terminology
> glitch in the rare cases like base64, hex and oct (where the character format
> is technically the encoded format), and leave it so that there is a single
> method pair (bytes.decode to go from bytes to characters, and text.encode to
> go from characters to bytes).
I think you have it pretty straight here.
While playing around with the example bytes class I noticed code reads
much better when I use methods called tounicode and tostring.
b64ustring = b.tounicode('base64')
b = bytes(b64ustring, 'base64')
The bytes could then *not* ignore the string decode codec but use it for
string to string decoding.
b64string = b.tostring('base64')
b = bytes(b64string, 'base64')
b = bytes(hexstring, 'hex')
hexstring = b.tostring('hex')
hexstring = b.tounicode('hex')
An exception could be raised if the codec does not support input or
output type depending on the situation.
This would allow for differnt types of codecs to live together without
as much confusion I think.
I'm not suggesting we start using to-type everywhere, just where it
might make things clearer over decode and encode.
Expecting it not to fly, but just maybe it could?
More information about the Python-Dev