[Python-Dev] bytes.from_hex()
Josiah Carlson
jcarlson at uci.edu
Mon Feb 20 05:28:41 CET 2006
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>
> >>>>> "Josiah" == Josiah Carlson <jcarlson at uci.edu> writes:
>
> Josiah> The question remains: is str.decode() returning a string
> Josiah> or unicode depending on the argument passed, when the
> Josiah> argument quite literally names the codec involved,
> Josiah> difficult to understand? I don't believe so; am I the
> Josiah> only one?
>
> Do you do any of the user education *about codec use* that you
> recommend? The people I try to teach about coding invariably find it
> difficult to understand. The problem is that the near-universal
> intuition is that for "human-usable text" is pretty much anything *but
> Unicode* will do. This is a really hard block to get them past.
> There is very good reason why Unicode is plain text ("original" in
> MAL's terms) and everything else is encoded ("derived"), but students
> new to the concept often take a while to "get" it.
I've not been teaching Python; when I was still a TA, it was strictly
algorithms and data structures. Of those people who I have had the
opportunity to entice into Python, I've not followed up on their
progress to know if they had any issues.
I try to internalize it by not thinking of strings as encoded data, but
as binary data, and unicode as text. I then remind myself that unicode
isn't native on-disk or cross-network (which stores and transports bytes,
not characters), so one needs to encode it as binary data. It's a
subtle difference, but it has worked so far for me.
In my experience, at least for only-English speaking users, most people
don't even get to unicode. I didn't even touch it until I had been well
versed with the encoding and decoding of all different kinds of binary
data, when a half-dozen international users (China, Japan, Russia, ...)
requested its support in my source editor; so I added it. Supporting it
properly hasn't been very difficult, and the only real nit I have
experienced is supporting the encoding line just after the #! line for
arbitrary codecs (sometimes saving a file in a particular encoding dies).
I notice that you seem to be in Japan, so teaching unicode is a must.
If you are using the "unicode is text" and "strings are data", and they
aren't getting it; then I don't know.
> Maybe it's just me, but whether it's the teacher or the students, I am
> *not* excited about the education route. Martin's simple rule *is*
> simple, and the exceptions for using a "nonexistent" method mean I
> don't have to reinforce---the students will be able to teach each
> other. The exceptions also directly help reinforce the notion that
> text == Unicode.
Are you sure that they would help? If .encode() and .decode() drop from
strings and unicode (respectively), they get an AttributeError. That's
almost useless. Raising a better exception (with more information)
would be better in that case, but losing the functionality that either
would offer seems unnecessary; which is why I had suggested some of the
other method names. Perhaps a "This method was removed because it
confused users. Use help(str.encode) (or unicode.decode) to find out
how you can do the equivalent, or do what you *really* wanted to do."
> I grant the point that .decode('base64') is useful, but I also believe
> that "education" is a lot more easily said than done in this case.
What I meant by "education" is 'better documentation' and 'better
exception messages'. I didn't learn Python by sitting in a class; I
learned it by going through the tutorial over a weekend as a 2nd year
undergrad and writing software which could do what I wanted/needed.
Compared to the compiler messages I'd been seeing from Codewarrior and
MSVC 6, Python exceptions were like an oracle. I can understand how
first-time programmers can have issues with *some* Python exception
messages, which is why I think that we could use better ones. There is
also the other issue that sometimes people fail to actually read the
messages.
Again, I don't believe that an AttributeError is any better than an
"ordinal not in range(128)", but "You are trying to encode/decode
to/from incompatible types. expected: a->b got: x->y" is better. Some
of those can be done *very soon*, given the capabilities of the
encodings module, and they could likely be easily migrated, regardless
of the decisions with .encode()/.decode() .
- Josiah
More information about the Python-Dev
mailing list