[EuroPython] i18n and realted issues

M.-A. Lemburg mal@lemburg.com
Tue, 29 Apr 2003 18:15:18 +0200


Magnus Lyck=E5 wrote:
> A subject which is typically ignored in python books, and very relevant
> in a python business perspective is localization and internationalizati=
on.
>=20
> Honestly, I haven't tried very hard, but this seems a bit confusing for
> me in Python. (Not that I can say that it's better in other languages.)
>=20
> Is is just me, or do others feel the same?
>=20
> How important is this to people, and what could be done about it? Somet=
hing
> to discuss at Europython 2003? (Are there plans for any BoFs etc.)

We can discuss many things, but would that make any difference ?
Getting this right is a lot of work and I don't see any funding
or interest from volunteers to get any of it done.

> This field contains a number of issues, from translation of messages to
> input and output of data formatted according to locale, and issues like
> Unicode and right-to-left text etc. It seems to me that this is far fro=
m
> ideal today. At least in Windows 2000, locale seems buggy:
 >
>  >>> locale.getdefaultlocale()
> ('sv_SE', 'cp1252')
>  >>> locale.setlocale(locale.LC_ALL, '')
> 'Swedish_Sweden.1252'
>  >>> locale.getlocale()
> ['Swedish_Sweden', '1252']
>=20
> You see, they aren't the same! This leads to:

You're mixing character sets with locales here.

>  >>> locale.resetlocale()
> Error: locale setting not supported
>=20
> Also, the example from the manual breaks:
>  >>> locale.setlocale(locale.LC_ALL, 'de')
> Error: locale setting not supported

That's probably because your Windows version doesn't support the
German locale. No surprise here :-) It should work on Linux which
usually comes with all sorts of locale information.

> Also note, that "some string".decode(locale.getlocale()[1]) won't work
> in Windows, but "some string".decode(locale.getdefaultlocale()[1])
> will work. There is no code page called just '1252'. Certainly confusin=
g
> and non-obvious to me. Not fun if we try to use non-default settings.

The codec registry only knows about "cp1252" because that's
the standard name. We can't go about and add all possible
aliases for each and every encoding out there.

> How do we display dates and times according to locale? Does the new
> date module handle that? locale.atof and locale.format can at least
> display floats right. (I think.)
>=20
> To the extent that the code is there, it's just briefly described in th=
e
> docs, and very little in all the Python books out there. Is this really
> such a peripheral issue?

Probably not too interesting to the US folks :-) Everybody
else seems to be using their own little tool sets for this.

> What about unicode and locale. They don't seem to get along extremely
> well today... For instance is seems x.sort(locale.strcoll) can't handle
> Unicode strings.=20

Right, collation support is still missing from the Unicode
implementation.

> Ok, in cas of doubt, refuse the temptation to guess. No
> one locale defines collation for all of unicode, but there will be more
> and more cases where we want to sort names from all over the world, wit=
h
> at least accents in place. How?

Using the collation support defined in the Unicode
standard (provided that someone writes the support code
needed for the Python implementation).

> locale.nl_langinfo is only available on some platforms... Etc etc.

--=20
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 29 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        56 days left