[Python-Dev] PEP 460 reboot

Stephen J. Turnbull stephen at xemacs.org
Tue Jan 14 10:54:58 CET 2014


Guido van Rossum writes:

 > Of course, nobody in their right mind would use a format string
 > containing UTF-16 or EBCDIC.

How about Shift JIS and Big 5 (traditionally "mandated by Microsoft"
in their respective regions, with Shift JIS still overwhelmingly
popular) and GB* ("GB18030 is not just a good idea, It's The Law")?
Are the Japanese and Chinese crazy by definition?  This is where I get
the willies -- not that you think anybody is crazy by definition, but
because I personally have to live with people who use crazy encodings
for interoperability reasons, in fact about half the text I process
daily for work is in those encodings.

Anyway, the thought makes me shiver.  GB2312 text may be encoded as
EUC-CN, in which case it is ASCII-compatible, so no problem.  I'm not
sure if that's the encoding typically denoted by "GB2312" in email,
though, and in any case it's irrelevant as most emails claiming
"charset=GB2312" I receive nowadays include characters from the
extension repertoires of GBK or GB18030.  Shift JIS, Big 5, and GBK
manage to avoid non-ASCII-compatible use of all characters significant
in Python %-formatting (yay!), but .format is right out because {} are
used.  GB18030 in principle uses far more of the code space, including
all of the syntactically significant punctuation, but in practice I
don't know how many of those characters are actually assigned, let
alone used.

 > And that is precisely my point. When you're using a format string,
 > all of the format string (not just the part between { and }) had
 > better use ASCII or an ASCII superset. And this (rightly)
 > constrains the output to an ASCII superset as well.

Except that if you interpolate something like Shift JIS, much of the
ASCII really isn't ASCII.  That's a general issue, of course, if you
do something that requires iterated format strings, but it's far more
likely to appear to work most of the time with those encodings.

Of course you can say "if it hurts, don't do that", but ....



More information about the Python-Dev mailing list