[Python-Dev] Filename as byte string in python 2.6 or 3.0?

"Martin v. Löwis" martin at v.loewis.de
Tue Sep 30 00:49:39 CEST 2008


> Originally I thought that this was a valid idea, but then it became
> clear that this could be a problem.  Consider a filename which includes
> a UTF-8 encoding of a PUA code point.

I still think it's a valid idea. For non-UTF-8 file system encodings,
use PUA characters, and generate them through an error handler.

If the file system encoding is UTF-8, use UTF-8b instead as the
file system encoding.

> Viewing the PUA with GNOME charmap, I can see that many code points
> there have character renderings on my Ubuntu system.  I have to assume,
> therefore, that there are other (and potentially conflicting) uses for
> this unicode feature.

Depends on how you use it. If you use the PUA block 1 (i.e.
U+E000..U+F8FF), there is a realistic chance of collision.

If you use the Plane 15 or Plane 16 PUA blocks, there is currently
zero chance of collision (AFAIK). PUA has a wide use for additional
characters in TrueType, but I don't think many tools even support
plane 15 and 16 for generating fonts, or rendering them (it may even
that the TrueType/OpenType format doesn't support them in the first
place). However, Python can make use of these planes fairly easily,
even in 2-byte mode (through UTF-16).

Regards,
Martin


More information about the Python-Dev mailing list