[Python-Dev] Unicode locale values in 2.7

"Martin v. Löwis" martin at v.loewis.de
Thu Dec 3 14:49:13 CET 2009


> But in trunk, the value is just used as-is. So when formating a decimal,
> for example, '\xc2\xa0' is just inserted into the result, such as:
>>>> format(Decimal('1000'), 'n')
> '1\xc2\xa0000'
> This doesn't make much sense

I agree with Antoine: it makes sense, and is the correct answer, given
the locale definition.

Now, I think that the locale definition is flawed - it's *not* a
property of the Czech language or culture that the "no-break space"
character is the thousands-separator. If anything other than the regular
space should be the thousands separator, it should be "thin space", and
it should be used in all locales on a system that currently use space.
Having it just in the Czech locale is a misconfiguration, IMO.

But if we accept the system's locale definition, then the above is
certainly the right answer.

> and causes an error when converting it to
> unicode:
>>>> format(Decimal('1000'), u'n')

You'll need to decode in the locale's encoding, then it would
work. Unfortunately, that is difficult to achieve.

> I believe that the correct solution is to do what py3k does in locale,
> which is to convert the struct lconv values to unicode. But since this
> would be a disruptive change if universally applied, I'd like to propose
> that we only convert to unicode if the values won't fit into a str.

I think Guido is on record for objecting to that kind of API strongly.

> So the algorithm would be something like:
> 1. call mbstowcs
> 2. if every value in the result is in the range [32, 126], return a str
> 3. otherwise, return a unicode

Not sure what API you are describing here - the algorithm for doing
what?

> This would mean that for most locales, the current behavior in trunk
> wouldn't change: the locale.localeconv() values would continue to be
> str. Only for those locales where the values wouldn't fit into a str
> would unicode be returned.
> 
> Does this seem like an acceptable change?

Definitely not. This will be just for 2.7, and I see no point in
producing such an incompatibility. Applications may already perform
the conversion themselves, and that would break under such a change.

Regards,
Martin



More information about the Python-Dev mailing list