[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
aahz at pythoncraft.com
Thu Apr 30 04:50:50 CEST 2009
On Thu, Apr 30, 2009, Cameron Simpson wrote:
> The lengthy discussion mostly revolves around:
> - Glenn points out that strings that came _not_ from listdir, and that are
> _not_ well-formed unicode (== "have bare surrogates in them") but that
> were intended for use as filenames will conflict with the PEP's scheme -
> programs must know that these strings came from outside and must be
> translated into the PEP's funny-encoding before use in the os.*
> functions. Previous to the PEP they would get used directly and
> encode differently after the PEP, thus producing different POSIX
> filenames. Breakage.
> - Glenn would like the encoding to use Unicode scalar values only,
> using a rare-in-filenames character.
> That would avoid the issue with "outside' strings that contain
> surrogates. To my mind it just moves the punning from rare illegal
> strings to merely uncommon but legal characters.
> - Some parties think it would be better to not return strings from
> os.listdir but a subclass of string (or at least a duck-type of
> string) that knows where it came from and is also handily
> recognisable as not-really-a-string for purposes of deciding
> whether is it PEP-funny-encoded by direct inspection.
Assuming people agree that this is an accurate summary, it should be
incorporated into the PEP.
Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/
"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur." --Red Adair
More information about the Python-Dev