On Wed, Dec 1, 2010 at 7:17 PM, Steven D'Aprano <steve@pearwood.info> wrote: ..
we should continue to support the existing behaviour. None of the arguments against it seem convincing to me, particularly since the opponents of the current behaviour admit that there is a use-case for it, but they just want it to move elsewhere, such as the locale module.
I don't remember who made this argument, but I think you misunderstood it. The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how Eastern Arabic numerals are written. So far nobody has even claimed to know conclusively that Arabic-Indic digits are always written left-to-right.
unicodedata.bidirectional('٤') 'AN'
is not very helpful because it means "any Arabic-Indic digit" according to unicode.org. (To me, a special category hints that it may be written in either direction and the proper interpretation may depend on context.) I have not seen a real use case reported in this thread and for theoretical use cases, the current implementation is either outright wrong or does not solve the problem completely. Given that a function that replaces all Unicode digits in a string with 0-9 can be written in 3 lines of Python code, it is very unlikely that anyone would prefer to rely on undocumented behavior of Python builtins instead of having explicit control over parsing of their data.