[Python-Dev] Python and the Unicode Character Database

Terry Reedy tjreedy at udel.edu
Fri Dec 3 02:52:23 CET 2010


On 12/2/2010 6:54 PM, Alexander Belopolsky wrote:
> On Thu, Dec 2, 2010 at 4:14 PM, M.-A. Lemburg<mal at egenix.com>  wrote:
> ..
>> Some examples:
>>
>> http://www.bdl.gov.lb/circ/intpdf/int123.pdf
>
> I looked at this one more closely.  While I cannot understand what it
> says, It appears that Arabic numerals are used in dates.   It looks
> like Python want be able to deal with those:

When I travelled in S. Asia around 25 years ago, arabic and indic 
numerals were in obvious use in stores, road signs, and banks (as with 
money exchange receipts). I learned the digits partly for 
self-protestions ;-). I have no real idea of what is done *now* in 
computerized business, but I assume the native digits are used.

It may well be that there is no Python software yet that operates with 
native digits. The lack of direct output capability would hinder that. 
Of course, someone could run both input and output through 
language-specific str.translate digit translators.

>>>> datetime.strptime('١٩٩٩/١٠/٢٩', '%Y/%m/%d')

Googling ١٩٩٩ gets about 83,000 hits.
> ..
> ValueError: time data '١٩٩٩/١٠/٢٩' does not match format '%Y/%m/%d'
>
> Interestingly,
>
>>>> datetime.strptime('١٩٩٩', '%Y')
> datetime.datetime(1999, 1, 1, 0, 0)
>
> which further suggests that support of such numerals is accidental.
>
> As I think more about it, though I am becoming less avert to accepting
> these numerals for base 10 integers.

Both input and output are needed for educational programming, though 
translation tables might be enough.

 >  Integers can be easily extracted
> from text using simple regex and '\d' accepts all category Nd
> characters.  I would require though that all digits be from the same
> block, which is not hard because Unicode now promises to only have
> them in contiguous blocks of 10.

That seems sensible.

 > This rule seems to address some of
> security issues because it is unlikely that a system that can display
> some of the local digits would not be able to display all of them
> properly.
>
> I still don't think it makes any sense to accept them in float().

For the present, I would pretty well agree with that, at least until we 
know more.

You have raised an important issue. It is a bit of a chicken and egg 
problem though. We will not really know what is needed until Python is 
used more in non-english/non-euro contexts, while such usage may await 
better support.

-- 
Terry Jan Reedy




More information about the Python-Dev mailing list