[Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions

Stephen J. Turnbull stephen at xemacs.org
Wed Jun 12 10:53:36 CEST 2013


Andrew Barnert writes:

 > MRAB suggested that maybe int and friends shouldn't do
 > transliteration at all; it would be better to have a new
 > "translate_number" function ("somewhere", not necessarily in
 > builtins,but presumably in the stdlib).

I'm pretty sure I suggested that first, and I know I've suggested it
on two or more different occasions.  I don't care about credit, but
it's pretty clear that nobody is paying much attention to what anybody
else is writing. :-(

 > In my email that you replied to, I was agreeing with that idea,

My apologies.  I read the whole thing twice, but failed to grasp that.

 > >>  If 二万三十, 2万3十, and 20030 all decode to 20030 from Japanese
 > > 
 > > Only the third does in a non-locale-specific way.  The characters for
 > > "man" and "juu" have numeric values, but not decimal ones.
 > 
 > I brought up Japanese as one of my original examples specifically
 > because it has common numeric forms that aren't decimal.

That wasn't clear to me.  In any case, my point is that such forms are
clearly irrelevant to the arguments MAL and Steven d'Aprano (among
others whose names I admit I've forgotten) present for "promiscuous"
builtins.  Those arguments specifically rely on the lack of ambiguity
of the *decimal* values for digits, and deliberately ignore the issue
of roundtripping.

 > The point of reusing this example is that there are three
 > _different_ such forms in one locale. Locale-decoding all of them
 > to '20030' is easy, but locale-encoding '20030' is then a
 > problem.

Hardly.  '20030' will always be understood, so the builtin str() is
useful and often sufficient.  Picking one of the other forms is
application- (and often user-) dependent, just as representing the
integer 20030 as an ASCII string is ambiguous.  (Look how many format
characters we devote to that one task!  Heck, POSIX time is an
integer, so we could probably make a case for including most of
strftime(3)!)

 > My question is whether those locale-specific functions are
 > well-specified.

They don't exist yet, so that's not an answerable question.

My suggestion is that, as with any translation, we ask the native
speakers for help.  As with any case of ambiguity, we refuse to guess
-- instead we provide multiple styles as suggested by the native-
speaking consultants or requested by users (resources permitting, of
course).

Yeah, I know, it's terribly mendokusai[1], but we really don't have an
alternative except to tell the users to do it themselves, because
that's exactly what they will do if they want a particular style we
don't provide.  It just seems to me that it would be useful to provide
a registry of styles that other people have already written, maybe on
PyPI, maybe in the stdlib.  If commonly used, it could become quite
flexible and robust in a fairly short period of time.

Footnotes: 
[1]  Literally, "smells troublesome" in Japanese.



More information about the Python-ideas mailing list