[Python-3000] Making more effective use of slice objects in Py3k
Josiah Carlson
jcarlson at uci.edu
Tue Aug 29 21:04:35 CEST 2006
"Guido van Rossum" <guido at python.org> wrote:
> On 8/29/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> > String operations always returning views would be arguably insane. I
> > hope no one was recommending it (I certainly wasn't, but if my words
> > were confusing on that part, I apologize); strings are strings, and
> > views should only be constructed explicitly.
>
> I don't know about you, but others have definitely been arguing for
> that passionately in the past.
>
> > After you have a view, I'm of the opinion that view operations should
> > return views, except in the case where you explicitly ask for a string
> > via str(view).
>
> I think it's a mixed bag, and depends on the semantics of the operation.
>
> For operations that are guaranteed to return a substring (like slicing
> or partition() -- are there even others?) I think views should return
> views (on the original buffer, never views on views).
I agree.
> For operations that may be forced to return a new string (e.g.
> concatenation) I think the return value should always be a new string,
> even if it could be optimized. So for example if v is a view and s is
> a string, v+s should always return a new string, even if s is empty.
I'm on the fence about this. On the one hand, I understand the
desireability of being able to get the underlying string object without
difficulty. On the other hand, its performance characteristics could be
confusing to users of Python who may have come to expect that "st+''" is
a constant time operation, regardless of the length of st.
The non-null string addition case, I agree that it could make some sense
to return the string (considering you will need to copy it anyways), but
if one returned a view on that string, it would be more consistant with
other methods, and getting the string back via str(view) would offer
equivalent functionality. It would also require the user to be explicit
about what they really want; though there is the argument that if I'm
passing a string as an operand to addition with a view, I actually want
a string, so give me one.
I'm going to implement it as returning a view, but leave commented
sections for some of them to return a string.
> BTW beware that in py3k, strings (which will always be unicode
> strings) won't support the buffer API -- bytes objects will. Would you
> want views on strings or ob bytes or on both?
That's tricky. Views on bytes will come for free, like array, mmap, and
anything else that supports the buffer protocol. It requires the removal
of the __hash__ method for mutables, but that is certainly expected.
Right now, a large portion of standard library code use strings and
string methods to handle parsing, etc. Removing immutable byte strings
from 3.x seems likely to result in a huge amount of rewriting necessary
to utilize either bytes or text (something I have mentioned before). I
believe that with views on bytes (and/or sufficient bytes methods), the
vast majority would likely result in the use of bytes.
Having a text view for such situtions that works with the same kinds of
semantics as the bytes view would be nice from a purity/convenience
standpoint, and only needing to handle a single data type (text) could
make its implementation easier. I don't have any short-term plans of
writing text views, but it may be somewhat easier to do after I'm done
with string/byte views.
- Josiah
More information about the Python-3000
mailing list