[Python-3000] Unicode and OS strings

Paul Moore p.f.moore at gmail.com
Fri Sep 21 17:59:43 CEST 2007


On 21/09/2007, Jim Jewett <jimjjewett at gmail.com> wrote:
> If you are using text (as opposed to bytes), then À can be either
> U+00C0 or <U+0041, U+0300>.  If the file system makes a distinction,
> then it is using bytes, and any program interacting with it needs* to
> use bytes too.

OK. I don't know enough about Unicode (or this low a level of the
Windows API) to be sure. But it's certainly possible that under
Windows, the file system (API) doesn't make a distinction.

> * To be correct; in practice, the problems will occur rarely enough
> that most people won't notice.

Too right. The only explicit case of an issue that I'm aware of is the
one that started the thread, of a Unix system with incompatible
terminal and filesystem encodings (or was it extremely obscure shell
incantations? whatever, it was well beyond my level of Unix
knowledge).

I'd say YAGNI except that someone seems to have demonstrated a genuine
(if rare) need on Unix. I'll stick with YAGNI on Windows, though.
(Where's uncle Tim to point out that Windows is the better platform
when you need him? :-))

Paul.

PS I'm now so far out of my depth on Unicode issues that I'll drop out
of this thread at this point.


More information about the Python-3000 mailing list