"Martin v. Löwis" wrote:
Am 30.11.2010 21:24, schrieb Ben Finney:
haiyang kang <cornsea@gmail.com> writes:
I think it is a little ugly to have code like this: num = float("一.一"), expected result is: num = 1.1
That's a straw man, though. The string need not be a literal in the program; it can be input to the program.
num = float(input_from_the_external_world)
Does that change your assessment of whether non-ASCII digits are used?
I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. You would need a number of key strokes to enter each individual ideograph, plus you have to press the keys for keyboard layout switching to enter the Latin decimal separator (which you normally wouldn't use along with the Han numerals).
That's a somewhat limited view, IMHO. Numbers are not always entered using a computer keyboard, you have tool like cash registries, special numeric keypads, scanners, OCR, etc. for external entry, and you also have other programs producing such output, e.g. MS Office if configured that way. The argument with the decimal point doesn't work well either, since it's obvious that float() and int() do not support localized input. E.g. in Germany we write 3,141 instead of 3.141:
float('3,141') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for float(): 3,141
No surprise there. The localization of the input data, e.g. removal of thousands separators and conversion of decimal marks to the dot, have to be done by the application, just like you have to now for German floating point number literals. The locale module already has locale.atof() and locale.atoi() for just this purpose. FYI, here's a list of decimal digits supported by Python 2.7: http://www.unicode.org/Public/5.2.0/ucd/extracted/DerivedNumericType.txt: """ 0030..0039 ; Decimal # Nd [10] DIGIT ZERO..DIGIT NINE 0660..0669 ; Decimal # Nd [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE 06F0..06F9 ; Decimal # Nd [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED ARABIC-INDIC DIGIT NINE 07C0..07C9 ; Decimal # Nd [10] NKO DIGIT ZERO..NKO DIGIT NINE 0966..096F ; Decimal # Nd [10] DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE 09E6..09EF ; Decimal # Nd [10] BENGALI DIGIT ZERO..BENGALI DIGIT NINE 0A66..0A6F ; Decimal # Nd [10] GURMUKHI DIGIT ZERO..GURMUKHI DIGIT NINE 0AE6..0AEF ; Decimal # Nd [10] GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE 0B66..0B6F ; Decimal # Nd [10] ORIYA DIGIT ZERO..ORIYA DIGIT NINE 0BE6..0BEF ; Decimal # Nd [10] TAMIL DIGIT ZERO..TAMIL DIGIT NINE 0C66..0C6F ; Decimal # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE 0CE6..0CEF ; Decimal # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE 0D66..0D6F ; Decimal # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE 0E50..0E59 ; Decimal # Nd [10] THAI DIGIT ZERO..THAI DIGIT NINE 0ED0..0ED9 ; Decimal # Nd [10] LAO DIGIT ZERO..LAO DIGIT NINE 0F20..0F29 ; Decimal # Nd [10] TIBETAN DIGIT ZERO..TIBETAN DIGIT NINE 1040..1049 ; Decimal # Nd [10] MYANMAR DIGIT ZERO..MYANMAR DIGIT NINE 1090..1099 ; Decimal # Nd [10] MYANMAR SHAN DIGIT ZERO..MYANMAR SHAN DIGIT NINE 17E0..17E9 ; Decimal # Nd [10] KHMER DIGIT ZERO..KHMER DIGIT NINE 1810..1819 ; Decimal # Nd [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE 1946..194F ; Decimal # Nd [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE 19D0..19DA ; Decimal # Nd [11] NEW TAI LUE DIGIT ZERO..NEW TAI LUE THAM DIGIT ONE 1A80..1A89 ; Decimal # Nd [10] TAI THAM HORA DIGIT ZERO..TAI THAM HORA DIGIT NINE 1A90..1A99 ; Decimal # Nd [10] TAI THAM THAM DIGIT ZERO..TAI THAM THAM DIGIT NINE 1B50..1B59 ; Decimal # Nd [10] BALINESE DIGIT ZERO..BALINESE DIGIT NINE 1BB0..1BB9 ; Decimal # Nd [10] SUNDANESE DIGIT ZERO..SUNDANESE DIGIT NINE 1C40..1C49 ; Decimal # Nd [10] LEPCHA DIGIT ZERO..LEPCHA DIGIT NINE 1C50..1C59 ; Decimal # Nd [10] OL CHIKI DIGIT ZERO..OL CHIKI DIGIT NINE A620..A629 ; Decimal # Nd [10] VAI DIGIT ZERO..VAI DIGIT NINE A8D0..A8D9 ; Decimal # Nd [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE A900..A909 ; Decimal # Nd [10] KAYAH LI DIGIT ZERO..KAYAH LI DIGIT NINE A9D0..A9D9 ; Decimal # Nd [10] JAVANESE DIGIT ZERO..JAVANESE DIGIT NINE AA50..AA59 ; Decimal # Nd [10] CHAM DIGIT ZERO..CHAM DIGIT NINE ABF0..ABF9 ; Decimal # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT NINE FF10..FF19 ; Decimal # Nd [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE 104A0..104A9 ; Decimal # Nd [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE 1D7CE..1D7FF ; Decimal # Nd [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE """ The Chinese and Japanese ideographs are not supported because of the way they are defined in the Unihan database. I'm currently investigating how we could support them as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 01 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/