jcarlson at uci.edu
Sat Feb 18 10:16:07 CET 2006
Ron Adam <rrr at ronadam.com> wrote:
> Josiah Carlson wrote:
> > Bengt Richter had a good idea with bytes.recode() for strictly bytes
> > transformations (and the equivalent for text), though it is ambiguous as
> > to the direction; are we encoding or decoding with bytes.recode()? In
> > my opinion, this is why .encode() and .decode() makes sense to keep on
> > both bytes and text, the direction is unambiguous, and if one has even a
> > remote idea of what the heck the codec is, they know their result.
> > - Josiah
> I like the bytes.recode() idea a lot. +1
> It seems to me it's a far more useful idea than encoding and decoding by
> overloading and could do both and more. It has a lot of potential to be
> an intermediate step for encoding as well as being used for many other
> translations to byte data.
Indeed it does.
> I think I would prefer that encode and decode be just functions with
> well defined names and arguments instead of being methods or arguments
> to string and Unicode types.
Attaching it to string and unicode objects is a useful convenience.
Just like x.replace(y, z) is a convenience for string.replace(x, y, z) .
Tossing the encode/decode somewhere else, like encodings, or even string,
I see as a backwards step.
> I'm not sure on exactly how this would work. Maybe it would need two
> sets of encodings, ie.. decoders, and encoders. An exception would be
> given if it wasn't found for the direction one was going in.
> Roughly... something or other like:
> import encodings
> encodings.tostr(obj, encoding):
> if encoding not in encoders:
> raise LookupError 'encoding not found in encoders'
> # check if obj works with encoding to string
> # ...
> b = bytes(obj).recode(encoding)
> return str(b)
> encodings.tounicode(obj, decodeing):
> if decoding not in decoders:
> raise LookupError 'decoding not found in decoders'
> # check if obj works with decoding to unicode
> # ...
> b = bytes(obj).recode(decoding)
> return unicode(b)
> Anyway... food for thought.
Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something?
Are we going to need to embed the direction in the encoding/decoding
name (to_base64, from_base64, etc.)? That doesn't any better than
binascii.b2a_base64 . What about .reencode and .redecode? It seems as
though the 're' added as a prefix to .encode and .decode makes it
clearer that you get the same type back as you put in, and it is also
unambiguous to direction.
The question remains: is str.decode() returning a string or unicode
depending on the argument passed, when the argument quite literally
names the codec involved, difficult to understand? I don't believe so;
am I the only one?
More information about the Python-Dev