[Python-Dev] Unicode locale values in 2.7

Mark Dickinson dickinsm at gmail.com
Thu Dec 3 12:55:11 CET 2009


On Thu, Dec 3, 2009 at 11:33 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Eric Smith <eric <at> trueblade.com> writes:
>>
>> But in trunk, the value is just used as-is. So when formating a decimal,
>> for example, '\xc2\xa0' is just inserted into the result, such as:
>> >>> format(Decimal('1000'), 'n')
>> '1\xc2\xa0000'
>> This doesn't make much sense,
>
> Why doesn't it make sense? It's normal UTF-8.
> The same thing happens when the monetary sign is non-ASCII, see
> Lib/test/test_locale.py for an example.

Well, one problem is that it messes up character counts.  Suppose
you're aware that the thousands separator might be a single multibyte
character, and you want to produce a unicode result that's zero-padded
to a width of 6.  There's currently no sensible way of doing this that
I can see:

format(Decimal('1000'), '06n').decode('utf-8') gives a string of length 5

format(Decimal('1000'), u'06n') fails with a UnicodeDecodeError.

Mark


More information about the Python-Dev mailing list