[Python-Dev] Bytes path related questions for Guido
ncoghlan at gmail.com
Mon Aug 25 01:19:19 CEST 2014
On 25 Aug 2014 03:55, "Guido van Rossum" <guido at python.org> wrote:
> Yes on #1 -- making the low-level functions more usable for edge cases by
supporting bytes seems fine (as long as the support for strings, where it
exists, is not compromised).
> The status of pathlib is a little unclear to me -- is there a plan to
eventually support bytes or not?
It's text only and Antoine plans to keep it that - the concatenation
operations, etc, are really only safe if you decode first.
> For #2 I think you should probably just work with the others you have
Yes, that sounds like a good idea. There's been some good progress on the
issue tracker, so I think we can thrash out some workable (and
comprehensible!) utilities that will be useful in their own right while
also serving as aids to understanding for the underlying mechanisms.
> On Sat, Aug 23, 2014 at 9:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> At Guido's request, splitting out two specific questions from Serhiy's
>> thread where I believe we could do with an explicit "yes or no" from
>> 1. Should we accept patches adding support for the direct use of bytes
>> paths in lower level filesystem manipulation APIs? (i.e. everything
>> that isn't pathlib)
>> This was Serhiy's original question (due to some open issues [1,2]). I
>> think the answer is yes, as we already do in some cases, and the
>> "pathlib doesn't support binary paths" design decision is a high level
>> platform independent API vs low level potentially platform dependent
>> API one rather than being about disallowing the use of bytes paths in
>>  http://bugs.python.org/issue19997
>>  http://bugs.python.org/issue20797
>> 2. Should we add some additional helpers to the string module for
>> dealing with surrogate escaped bytes and other techniques for
>> smuggling arbitrary binary data as text?
>> My proposal  is to add:
>> * string.escaped_surrogates (constant with the 128 escaped code points)
>> * string.clean(s): replaces surrogates with '\ufffd' or another
>> specified code point
>> * string.redecode(s, encoding): encodes a string back to bytes and
>> then decodes it again using the specified encoding (the old encoding
>> defaults to 'latin-1' to match the assumptions in WSGI)
>> "s != string.clean(s)" would then serve as a check for "does this
>> string contain any surrogate escaped bytes?"
>>  http://bugs.python.org/issue18814#msg225791
>> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
>> Python-Dev mailing list
>> Python-Dev at python.org
> --Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev