[Python-Dev] Bytes path related questions for Guido

Nick Coghlan ncoghlan at gmail.com
Sun Aug 24 06:44:36 CEST 2014

At Guido's request, splitting out two specific questions from Serhiy's
thread where I believe we could do with an explicit "yes or no" from

1. Should we accept patches adding support for the direct use of bytes
paths in lower level filesystem manipulation APIs? (i.e. everything
that isn't pathlib)

This was Serhiy's original question (due to some open issues [1,2]). I
think the answer is yes, as we already do in some cases, and the
"pathlib doesn't support binary paths" design decision is a high level
platform independent API vs low level potentially platform dependent
API one rather than being about disallowing the use of bytes paths in

[1] http://bugs.python.org/issue19997
[2] http://bugs.python.org/issue20797

2. Should we add some additional helpers to the string module for
dealing with surrogate escaped bytes and other techniques for
smuggling arbitrary binary data as text?

My proposal [3] is to add:

* string.escaped_surrogates (constant with the 128 escaped code points)
* string.clean(s): replaces surrogates with '\ufffd' or another
specified code point
* string.redecode(s, encoding): encodes a string back to bytes and
then decodes it again using the specified encoding (the old encoding
defaults to 'latin-1' to match the assumptions in WSGI)

"s != string.clean(s)" would then serve as a check for "does this
string contain any surrogate escaped bytes?"

[3] http://bugs.python.org/issue18814#msg225791


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list