[Python-Dev] [Python-3000] Proposed Python 3.0 schedule

Wed Oct 8 00:22:13 CEST 2008

On Oct 7, 2008, at 4:45 PM, Adam Olsen wrote:
> So what does Qt do when given a file name already using those PUA?
> Looks like they get passed through untouched when decoded, but will
> get translated into invalid names upon encoding.

Well, I'd say that looks like a bug. It should probably decode those  
PUA characters as if they were undecodeable sequences so that they too  
roundtrip properly.

> So you still have
> file names you can't open

In practical terms, I suspect nobody has ever run into a file which  
has this problem. You certainly can't say that is the case for  
Python-3's current behavior; my suspicion is that anyone who uses any  
non-ascii filenames at all will run into issues with Python3's  
behavior at least once.

> , and you're incompatible with what other
> libraries do.

I'm sure there's a situation where that matters, but, at least I can  
run kpdf /any/arbitrary/file.pdf and have it work. And use the KDE  
file chooser, and have it able to browse my files, and choose any  
file, no matter what random characters it has in it. If there is an  
issue with interfacing to another library, the string can be converted  
to whatever the other library expects at the interface point...

People keep claiming that odd filenames are only going to be an issue  
for "backup tools", but I don't think that's true. I think it'll be an  
issue for most any program that reads user-specified files. Whether it  
be by running Python in an ASCII (e.g. "C") locale when there are  
files created with UTF-8 names, or by having copied/downloaded a file  
with an incorrectly encoded name, it's going to come up, and be an  
irritant when it does.

That Qt felt the need to make this change rather strengthens that  
point IMO...

> The only thing going for Qt is that they seem specifically interested
> in latin-1, rather than arbitrary bad names.  The latin-1 strings that
> would correspond to the UTF-8 PUA used would include at least one
> control character, as well as other unusual bits, so it's pretty
> unlikely to encounter a real latin-1 file name like that.

I'd say they're most concerned about files that their users are likely  
to run into, yes.

James