unicode filenames

Neil Hodgson nhodgson at bigpond.net.au
Mon Feb 3 12:00:19 CET 2003

Andrew Dalke:

> And what happens when a remote file is mounted, say, from a MS
> Windows OS?  Are they represented as UTF-8?  Something else?
> Is that standardized or is it a property of the mount mechanism
> and can change accordingly?

   The default mount options I have seen turn the Unicode file names into
'?'s. However, with the a VFAT file system that has some Unicode file names
on my machine, mounting the partition from Linux with the utf8 option in
/dev/hda5 /eff vfat auto,shortname=winnt,utf8,owner 0 0
   leads to UTF-8 strings being returned to user programs. Since Red Hat 8.0
defaults to UTF-8 locales, many programs such as Nautilus and the standard
GTK+ file open dialog display these file names correctly although some
characters are still not seen because the default UI fonts do not have all
the required characters. Still, European, Cyrillic, Greek, were OK and Asian
characters often displayed as boxes with codes inside.

>    if os.path.supports_unicode_filenames:
>      cwd = os.getcwdu()
>    else:
>      encoding = .. get default filesystem encoding ... or 'latin-1'
>      cwd = unicode(os.getcwd(), encoding)
> Ugly .. quite ugly.  And suggestions on the proper way to
> handle this is not documented as far as I can find.

   Yes, it is ugly but I don't know how to handle this well on Unix. In my
above example there is one partition mounted in UTF-8 mode but other
partitions could be using other encodings. I imagine there is some way to
reach the mount options for a given directory...


More information about the Python-list mailing list