[Python-Dev] fun with unicode, part 1
Tim Peters
tim_one@email.msn.com
Tue, 2 May 2000 03:20:52 -0400
[Guido asks good questions about how Windows deals w/ Unicode filenames,
last Thursday, but gets no answers]
> ...
> I'd like to solve this problem, but I have some questions: what *IS*
> the encoding used for filenames on Windows? This may differ per
> Windows version; perhaps it can differ drive letter? Or per
> application or per thread? On Windows NT, filenames are supposed to
> be Unicode. (I suppose also on Windowns 2000?) How do I open a file
> with a given Unicode string for its name, in a C program? I suppose
> there's a Win32 API call for that which has a Unicode variant.
>
> On Windows 95/98, the Unicode variants of the Win32 API calls don't
> exist. So what is the poor Python runtime to do there?
>
> Can Japanese people use Japanese characters in filenames on Windows
> 95/98? Let's assume they can. Since the filesystem isn't Unicode
> aware, the filenames must be encoded. Which encoding is used? Let's
> assume they use Microsoft's multibyte encoding. If they put such a
> file on a floppy and ship it to Linköping, what will Fredrik see as
> the filename? (I.e., is the encoding fixed by the disk volume, or by
> the operating system?)
>
> Once we have a few answers here, we can solve the problem. Note that
> sometimes we'll have to refuse a Unicode filename because there's no
> mapping for some of the characters it contains in the filename
> encoding used.
I just thought I'd repeat the questions <wink>. However, I don't think
you'll really want the answers -- Windows is a legacy-encrusted mess, and
there are always many ways to get a thing done in the end. For example ...
> Question: how does Fredrik create a file with a Euro
> character (u'\u20ac') in its name?
This particular one is shallower than you were hoping: in many of the
TrueType fonts (e.g., Courier New but not Courier), Windows extended its
Latin-1 encoding by mapping the Euro symbol to the "control character" 0x80.
So I can get a Euro symbol into a file name just by typing Alt+0+1+2+8.
This is true even on US Win98 (which has no visible Unicode support) -- but
was not supported in US Win95.
i've-been-tracking-down-what-appears-to-be-a-hw-bug-on-a-japanese-laptop-
at-work-so-can-verify-ms-sure-got-japanese-characters-into-the-
filenames-somehow-but-doubt-it's-via-unicode-ly y'rs - tim