[Python-Dev] Python and the Unicode Character Database

Lennart Regebro regebro at gmail.com
Tue Nov 30 09:10:37 CET 2010


On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> While we have little choice but to follow UCD in defining
> str.isidentifier(), I think Python can promise users more stability in
> what it treats as space or as a digit in its builtins.

Why? I can see this is a problem if one character that earlier was
allowed no longer is. That breaks backwards compatibility. This
doesn't.

>>>> float('١٢٣٤.٥٦')
> 1234.56
>
> is more important than to assure users that once their program
> accepted some text as a number, they can assume that the text is
> ASCII.

*I* think it is more important. In python 3, you can never ever assume
anything is ASCII any more. ASCII is practically dead an buried as far
as Python goes, unless you explicitly encode to it.

> def deposit(self, amountstr):
>       self.balance += float(amountstr)
>       audit_log("Deposited: " + amountstr)
>
> Auditor:
>
> $ cat numbered-account.log
> Deposited: ?????.??

That log reasonably should be in UTF-8 or something else, in which
case this is not a problem. And that's ignoring that it makes way more
sense to log the numerical amount.

-- 
Lennart Regebro: http://regebro.wordpress.com/
Python 3 Porting: http://python3porting.com/
+33 661 58 14 64


More information about the Python-Dev mailing list