[Python-Dev] Bytes path related questions for Guido

Nick Coghlan ncoghlan at gmail.com
Mon Aug 25 01:19:19 CEST 2014


On 25 Aug 2014 03:55, "Guido van Rossum" <guido at python.org> wrote:
>
> Yes on #1 -- making the low-level functions more usable for edge cases by
supporting bytes seems fine (as long as the support for strings, where it
exists, is not compromised).

Thanks!

> The status of pathlib is a little unclear to me -- is there a plan to
eventually support bytes or not?

It's text only and Antoine plans to keep it that - the concatenation
operations, etc, are really only safe if you decode first.

>
> For #2 I think you should probably just work with the others you have
mentioned.

Yes, that sounds like a good idea. There's been some good progress on the
issue tracker, so I think we can thrash out some workable (and
comprehensible!) utilities that will be useful in their own right while
also serving as aids to understanding for the underlying mechanisms.

Cheers,
Nick.

>
>
> On Sat, Aug 23, 2014 at 9:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> At Guido's request, splitting out two specific questions from Serhiy's
>> thread where I believe we could do with an explicit "yes or no" from
>> him.
>>
>> 1. Should we accept patches adding support for the direct use of bytes
>> paths in lower level filesystem manipulation APIs? (i.e. everything
>> that isn't pathlib)
>>
>> This was Serhiy's original question (due to some open issues [1,2]). I
>> think the answer is yes, as we already do in some cases, and the
>> "pathlib doesn't support binary paths" design decision is a high level
>> platform independent API vs low level potentially platform dependent
>> API one rather than being about disallowing the use of bytes paths in
>> general.
>>
>> [1] http://bugs.python.org/issue19997
>> [2] http://bugs.python.org/issue20797
>>
>> 2. Should we add some additional helpers to the string module for
>> dealing with surrogate escaped bytes and other techniques for
>> smuggling arbitrary binary data as text?
>>
>> My proposal [3] is to add:
>>
>> * string.escaped_surrogates (constant with the 128 escaped code points)
>> * string.clean(s): replaces surrogates with '\ufffd' or another
>> specified code point
>> * string.redecode(s, encoding): encodes a string back to bytes and
>> then decodes it again using the specified encoding (the old encoding
>> defaults to 'latin-1' to match the assumptions in WSGI)
>>
>> "s != string.clean(s)" would then serve as a check for "does this
>> string contain any surrogate escaped bytes?"
>>
>> [3] http://bugs.python.org/issue18814#msg225791
>>
>> Regards,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140825/c425050d/attachment.html>


More information about the Python-Dev mailing list