[Python-Dev] Bytes path related questions for Guido

Nick Coghlan ncoghlan at gmail.com
Mon Aug 25 01:19:19 CEST 2014

On 25 Aug 2014 03:55, "Guido van Rossum" <guido at python.org> wrote:
> Yes on #1 -- making the low-level functions more usable for edge cases by
supporting bytes seems fine (as long as the support for strings, where it
exists, is not compromised).


> The status of pathlib is a little unclear to me -- is there a plan to
eventually support bytes or not?

It's text only and Antoine plans to keep it that - the concatenation
operations, etc, are really only safe if you decode first.

> For #2 I think you should probably just work with the others you have

Yes, that sounds like a good idea. There's been some good progress on the
issue tracker, so I think we can thrash out some workable (and
comprehensible!) utilities that will be useful in their own right while
also serving as aids to understanding for the underlying mechanisms.


> On Sat, Aug 23, 2014 at 9:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> At Guido's request, splitting out two specific questions from Serhiy's
>> thread where I believe we could do with an explicit "yes or no" from
>> him.
>> 1. Should we accept patches adding support for the direct use of bytes
>> paths in lower level filesystem manipulation APIs? (i.e. everything
>> that isn't pathlib)
>> This was Serhiy's original question (due to some open issues [1,2]). I
>> think the answer is yes, as we already do in some cases, and the
>> "pathlib doesn't support binary paths" design decision is a high level
>> platform independent API vs low level potentially platform dependent
>> API one rather than being about disallowing the use of bytes paths in
>> general.
>> [1] http://bugs.python.org/issue19997
>> [2] http://bugs.python.org/issue20797
>> 2. Should we add some additional helpers to the string module for
>> dealing with surrogate escaped bytes and other techniques for
>> smuggling arbitrary binary data as text?
>> My proposal [3] is to add:
>> * string.escaped_surrogates (constant with the 128 escaped code points)
>> * string.clean(s): replaces surrogates with '\ufffd' or another
>> specified code point
>> * string.redecode(s, encoding): encodes a string back to bytes and
>> then decodes it again using the specified encoding (the old encoding
>> defaults to 'latin-1' to match the assumptions in WSGI)
>> "s != string.clean(s)" would then serve as a check for "does this
>> string contain any surrogate escaped bytes?"
>> [3] http://bugs.python.org/issue18814#msg225791
>> Regards,
>> Nick.
>> --
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
> --
> --Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140825/c425050d/attachment.html>

More information about the Python-Dev mailing list