unicode filenames

Carlos Ribeiro cribeiro at mail.inet.com.br
Thu Feb 6 14:44:52 CET 2003

On Thursday 06 February 2003 11:16 am, Beni Cherniavsky wrote:
> Since unix can afford to change all APIs and programs like windows did
> (the mess that resulted explains why <wink>), unix must stay with the
> byte-orineted filenames at the low level.  This ensures that all programs
> that store file names in files, etc., continue to work.  UTF-8 is the only
> encoding that can represent all of unicode that satisfies all these needs,
> so everybody should migrate to UTF-8 filenames (CJK users might have
> reservations to this; I'd be happy to learn their opinion).

Sorry. It would be a big mess. Here in Brazil, I can safely assume that it is 
nearly impossible to find a computer *without* filenames with latin-1 
accented characters. Not to mention the problems that we have when mounting 
FAT partitions under Linux - many Unix users still need to use dual boot 
machines in order to use a few Windows apps.

In my opinion, this is the type of problem that has to be solved at its root, 
by slowly migrating the filesystem itself to accept only UTF-8 filenames. All 
conversions during the migration phase have to be done by the operating 
system itself; when moving files from one FS to the other, it would do the 
necessary conversions. It's not going to be easy, though.

Carlos Ribeiro
cribeiro at mail.inet.com.br

More information about the Python-list mailing list