[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Toshio Kuratomi a.badger at gmail.com
Fri Apr 24 23:26:12 CEST 2009

Glenn Linderman wrote:
> On approximately 4/24/2009 11:40 AM, came the following characters from
> And so my encoding (1) doesn't alter the data stream for any valid
> Windows file name, and where the naivest of users reside (2) doesn't
> alter the data stream for any Posix file name that was encoded as UTF-8
> sequences and doesn't contain ? characters in the file name [I perceive
> the use of ? in file names to be rare on Posix, because of experience,
> and because of the other problems caused by such use] (3) doesn't
> introduce data puns within applications that are correctly coded to know
> the encoding occurs.  The encoding technique in the PEP not only can
> produce data puns, thus not being reversible, it provides no reliable
> mechanism to know that this has occurred.
Uhm....  Not arguing with your goals but '?' is unfortunately reasonably
easy to get into a filename.  For instance, I've had to download a lot
of scratch built packages from our buildsystem recently.  Scratch builds
have url's with query strings in them so::


Which results in the filename:


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090424/232171fd/attachment-0001.pgp>

More information about the Python-Dev mailing list