[Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

INADA Naoki songofacandy at gmail.com
Thu Dec 7 01:49:30 EST 2017


> I care only about builtin open()'s behavior.
> PEP 538 doesn't change default error handler of open().
>
> I think PEP 538 and PEP 540 should behave almost identical except
> changing locale
> or not.  So I need very strong reason if PEP 540 changes default error
> handler of open().
>

I just came up with crazy idea; changing default error handler of open()
to "surrogateescape" only when open mode is "w" or "a".

When reading, "surrogateescape" error handler is dangerous because
it can produce arbitrary broken unicode string by mistake.

On the other hand, "surrogateescape" error handler for writing
is not so dangerous if encoding is UTF-8.
When writing normal unicode string, it doesn't create broken data.
When writing string containing surrogateescaped data, data is
(partially) broken before writing.

This idea allows following code:

    with open("files.txt", "w") as f:
        for fn in os.listdir():  # may returns surrogateescaped string
            f.write(fn+'\n')

And it doesn't allow following code:

    with open("image.jpg", "r") as f:  # Binary data, not UTF-8
        return f.read()


I'm not sure about this is good idea.  And I don't know when is good for
changing write error handler; only when PEP 538 or PEP 540 is used?
Or always when os.fsencoding() is UTF-8?

Any thoughts?

INADA Naoki  <songofacandy at gmail.com>


More information about the Python-Dev mailing list