[Python-Dev] Python and the Unicode Character Database

Stephen J. Turnbull stephen at xemacs.org
Sat Dec 4 09:13:45 CET 2010


Antoine Pitrou writes:
 > Le vendredi 03 décembre 2010 à 13:58 +0900, Stephen J. Turnbull a
 > écrit :
 > > Antoine Pitrou writes:
 > > 
 > >  > The legacy format argument looks like a red herring to me. When
 > >  > converting from a format to another it is the programmer's job to
 > >  > his/her job right.
 > > 
 > > Uhmmmmmm, the argument *for* this "feature" proposed by several people
 > > is that Python's numeric constructors do it (right) so that the
 > > programmer doesn't have to.
 > 
 > As far as I understand, Alexander was talking about a legacy pre-unicode
 > text format. We don't have to support this.

*I* didn't say we *should* support it.  I'm saying that *others'*
argument for not restricting the formats accepting by string to number
converters to something well-defined and AFAIK universally understood
by users (developers of Python programs *and* end-users) is that we
*already* support this.

Alexander, Martin, and I are basically just pointing out that no, the
"support" we have via the built-in numeric constructors is incomplete
and nonconforming.  We feel that is a bug to be fixed by (1)
implementing the definition as currently found in the documents, and
(2) moving the non-ASCII support to another module (or, as a
compromise, supporting non-ASCII digits via an argument to the
built-ins -- that was my proposal, I don't know if Alexander or Martin
would find it acceptable).

Given that some committers (MAL, you?) don't even consider that
accepting and converting a string containing digits from multiple
scripts as a single number is a bug, I'd really rather that this
bug/feature not be embedded in the interpreter.  I suppose that as a
built-in rather than syntax, technically it doesn't fall under the
moratorium, but it makes me nervous....


More information about the Python-Dev mailing list