[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Aahz aahz at pythoncraft.com
Thu Apr 30 04:50:50 CEST 2009

On Thu, Apr 30, 2009, Cameron Simpson wrote:
> The lengthy discussion mostly revolves around:
>   - Glenn points out that strings that came _not_ from listdir, and that are
>     _not_ well-formed unicode (== "have bare surrogates in them") but that
>     were intended for use as filenames will conflict with the PEP's scheme -
>     programs must know that these strings came from outside and must be
>     translated into the PEP's funny-encoding before use in the os.*
>     functions. Previous to the PEP they would get used directly and
>     encode differently after the PEP, thus producing different POSIX
>     filenames. Breakage.
>   - Glenn would like the encoding to use Unicode scalar values only,
>     using a rare-in-filenames character.
>     That would avoid the issue with "outside' strings that contain
>     surrogates. To my mind it just moves the punning from rare illegal
>     strings to merely uncommon but legal characters.
>   - Some parties think it would be better to not return strings from
>     os.listdir but a subclass of string (or at least a duck-type of
>     string) that knows where it came from and is also handily
>     recognisable as not-really-a-string for purposes of deciding
>     whether is it PEP-funny-encoded by direct inspection.

Assuming people agree that this is an accurate summary, it should be
incorporated into the PEP.
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair

More information about the Python-Dev mailing list