[Python-3000] Making more effective use of slice objects in Py3k
jcarlson at uci.edu
Tue Aug 29 07:31:37 CEST 2006
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Josiah Carlson wrote:
> > If every operation on a view returned a string copy, then what would be
> > the point of the view in the first place?
> String views would have all the same methods as a real
> string, so you could find(), index(), etc. while operating
> efficiently on the original data. To my mind this is
> preferable to having little-used optional arguments on
> an easily-forgotten subset of the string methods: you
> only have to remember one thing (how to create a view)
> rather than a bunch of random things.
Indeed, and all of those are preserved if views always returned views,
strings always returned strings, and one used the standard constructors
for both to convert between them; eg. str(view) -> str and view(str) ->
view. If one ever wanted a string from a view, rather than guessing
which would be the correct one to return (during the implementation of
views), always return a view when operating on views; it's a
constant-time operation per view returned, and if the user really wanted
a string, they can always call str on the returned values.
> For some things, such as partition(), it might be worth
> having a variant that returned views instead of new strings.
> But it would be named differently, so you'd still know
> whether you were getting a view or not.
But wouldn't it be confusing if some methods on views returned views,
while others returned strings? Wouldn't it make more sense if methods
on an object, generally, returned instances of the same type (when it
made sense)? This seems to be the case with almost every other object
available in the Python standard library, with the notable exceptions of
buffer and mmap.
The slicing operations on mmaps make sense, as only recently did mmaps
gain the ability to map partial files not starting from the beginning,
but I'm not sure how well operating system would handle overlapping
mmaps in the same process (especially during a larger mmap free; that
could bork the heap address space).
For buffer? I don't know. Buffer lacks basically every operation that
I use on a string, so I have had little use for it except as a way of
virtually slicing mmaps (for operations where I don't want to pass an
offset argument) and handling socket writing of large blocks of data
that it doesn't make sense to pre-slice*.
> I'm not personally advocating one approach or the other
> here -- just pointing out an alternative that might be
> more acceptable to the BDFL.
Thank you for the input (and thank you for Pyrex, it's making writing
the view object quite easy),
* Arguably it never makes sense to pre-slice; connection speeds can vary
so significantly that choosing a slice too small results in poor speeds
and high numbers of system calls, and slices that are too large
results in further slicing. Buffers or their equivalents win by a large
margin. One trick is to slice the buffer (turning it into a string)
when over half of the original string has been written. This results in
using at most 2x the minimum amount of memory necessary, while also
guaranteeing that you will only ever slice as much as the minimum
pre-slicing operation would necessitate.
More information about the Python-3000