[Python-Dev] bytes.from_hex()

Fri Feb 24 11:15:34 CET 2006

>>>>> "Ron" == Ron Adam <rrr at ronadam.com> writes:

    Ron> We could call it transform or translate if needed.

You're still losing the directionality, which is my primary objection
to "recode".  The absence of directionality is precisely why "recode"
is used in that sense for i18n work.

There really isn't a good reason that I can see to use anything other
than the pair "encode" and "decode".  In monolingual environments,
once _all_ human-readable text (specifically including Python programs
and console I/O) is automatically mapped to a Python (unicode) string,
most programmers will never need to think about it as long as Python
(the project) very very strongly encourages that all Python programs
be written in UTF-8 if there's any chance the program will be reused
in a locale other than the one where it was written.  (Alternatively
you can depend on PEP 263 coding cookies.)  Then the user (or the
Python interpreter) just changes console and file I/O codecs to the
encoding in use in that locale, and everything just works.

So the remaining uses of "encode" and "decode" are for advanced users
and specialists: people using stuff like base64 or gzip, and those who
need to use unicode codecs explicitly.

I could be wrong about the possibility to get rid of explicit unicode
codec use in monolingual environments, but I hope that we can at least
try to achieve that.

    >> Unlikely.  Errors like "A
    >> string".encode("base64").encode("base64") are all too easy to
    >> commit in practice.

    Ron> Yes,... and wouldn't the above just result in a copy so it
    Ron> wouldn't be an out right error.

No, you either get the following:

A string. -> QSBzdHJpbmcu -> UVNCemRISnBibWN1

or you might get an error if base64 is defined as bytes->unicode.

    Ron>     * Given that the string type gains a __codec__ attribute
    Ron> to handle automatic decoding when needed.  (is there a reason
    Ron> not to?)

    Ron>        str(object[,codec][,error]) -> string coded with codec

    Ron>        unicode(object[,error]) -> unicode

    Ron>        bytes(object) -> bytes

str == unicode in Py3k, so this is a non-starter.  What do you want to
say?

    Ron>      * a recode() method is used for transformations that
    Ron> *do_not* change the current codec.

I'm not sure what you mean by the "current codec".  If it's attached
to an "encoded object", it should be the codec needed to decode the
object.  And it should be allowed to be a "codec stack".  So suppose
you start with a unicode object "obj".  Then

>>> bytes = bytes (obj, 'utf-8')    # implicit .encode()
>>> print bytes.codec
['utf-8']
>>> wire = bytes.encode ('base64')  # with apologies to Greg E.
>>> print wire.codec
['base64', 'utf-8']
>>> obj2 = wire.decode ('gzip')
CodecMatchException
>>> obj2 = wire.decode (wire.codec)
>>> print obj == obj2
True
>>> print obj2.codec
[]

or maybe None for the last.  I think this would be very nice as a
basis for improving the email module (for one), but I don't really
think it belongs in Python core.

    Ron> That may be why it wasn't done this way to start.  (?)

I suspect the real reason is that Marc-Andre had the generalized codec
in mind from Day 0, and your proposal only works with duck-typing if
codecs always have a well-defined signature with two different types
for the argument and return of the "constructor".

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.