[Python-Dev] Import and unicode: part two

"Martin v. Löwis" martin at v.loewis.de
Wed Jan 26 11:12:02 CET 2011

Am 26.01.2011 10:40, schrieb Victor Stinner:
> Le lundi 24 janvier 2011 à 19:26 -0800, Toshio Kuratomi a écrit :
>> Why not locale:
>> * Relying on locale is simply not portable. (...)
>> * Mixing of modules from different locales won't work. (...)
> I don't understand what you are talking about.

I think by "portability", he means "moving files from one computer to
another". He argues that if Python would mandate UTF-8 for all file
names on Unix, moving files in such a way would support portability,
whereas using the locale's filename might not (if the locale use a
different charset on the target system).

While this is technically true, I don't think it's a helpful way of
thinking: by mandating that file names are UTF-8 when accessed from
Python, we make the actual files inaccessible on both the source and
the target system.

> I don't understand the relation between the local filesystem encoding
> and the portability. I suppose that you are talking about the
> distribution of a module to other computers. Here the question is how
> the filenames are stored during the transfer. The user is free to use
> any tool, and try to find a tool handling Unicode correctly :-) But it's
> no more the Python problem.

There are cases where there is no real "transfer", in the sense in which
you are using the word. For example, with NFS, you can access the very
same file simultaneously on two systems, with no file name conversion
(unless you are using NFSv4, and unless your NFSv4 implementations
support the UTF-8 mandate in NFS well).

Also, if two users of the same machine have different locale settings,
the same file name might be interpreted differently.


More information about the Python-Dev mailing list