
Hey folks, https://bitbucket.org/mikekap/pypy/commits/b774ae0be11b2012852a175f4bae44841... has an implementation of list slicing that copies the data on write. (The third idea from http://doc.pypy.org/en/latest/project-ideas.html .) It's a first pass (and also my first time working on the pypy codebase), so I wanted to solicit some feedback. I'm curious if this was even the right direction and if I'm actually breaking/slowing something down without realizing it. Also would anyone happen to know some representative stress/performance tests I could run? I ran some simple tests myself (some things got slightly faster), but I doubt that's enough :) Thanks, Mike. (Aside: there is a pull request @ https://bitbucket.org/pypy/pypy/pull-request/282/add-a-copy-on-write-slice-l... for this commit, but I clearly messed something up with hg - the diff is from an earlier copy and bitbucket doesn't seem to want to pick it up.)

Hi Mike A good test suite is pypy benchmark suite (https://bitbucket.org/pypy/benchmarks) which is relatively comprehensive and we run it nightly. If you run in trouble running it, please pop in to #pypy on freenode and we can help :-) On Tue, Jan 20, 2015 at 6:26 AM, Mike Kaplinskiy <mike.kaplinskiy@gmail.com> wrote:

Hi Mike, On 20 January 2015 at 05:26, Mike Kaplinskiy <mike.kaplinskiy@gmail.com> wrote:
One hard bit about implementing this kind of change is making sure you don't accidentally slow some other kinds of code down. This requires running all our benchmarks, at least, as fijal pointed out. Here are some additional issues to consider. We have to consider the overhead in terms of memory usage: it's probably fine if the overhead is only one pointer per list object (which often points to some fixed tuple like (0, None, None)). However, if you do "x = large_list[5:7]" you keep the whole "large_list" alive as long as "x" is alive. This might be an issue for some cases. Resolving it is possible but more involved. It would probably require GC support --- i.e. we can't really solve this nicely in regular RPython as it is now. The details need discussion, but I can think for example about a way to tell the GC "this is a pointer to the full list, but I'm only going to access this range of items, so ignore the rest". Another way would be to have some callback that copies just the items needed out of the large list, but that's full of open questions... A bientôt, Armin.

Hi Mike A good test suite is pypy benchmark suite (https://bitbucket.org/pypy/benchmarks) which is relatively comprehensive and we run it nightly. If you run in trouble running it, please pop in to #pypy on freenode and we can help :-) On Tue, Jan 20, 2015 at 6:26 AM, Mike Kaplinskiy <mike.kaplinskiy@gmail.com> wrote:

Hi Mike, On 20 January 2015 at 05:26, Mike Kaplinskiy <mike.kaplinskiy@gmail.com> wrote:
One hard bit about implementing this kind of change is making sure you don't accidentally slow some other kinds of code down. This requires running all our benchmarks, at least, as fijal pointed out. Here are some additional issues to consider. We have to consider the overhead in terms of memory usage: it's probably fine if the overhead is only one pointer per list object (which often points to some fixed tuple like (0, None, None)). However, if you do "x = large_list[5:7]" you keep the whole "large_list" alive as long as "x" is alive. This might be an issue for some cases. Resolving it is possible but more involved. It would probably require GC support --- i.e. we can't really solve this nicely in regular RPython as it is now. The details need discussion, but I can think for example about a way to tell the GC "this is a pointer to the full list, but I'm only going to access this range of items, so ignore the rest". Another way would be to have some callback that copies just the items needed out of the large list, but that's full of open questions... A bientôt, Armin.
participants (3)
-
Armin Rigo
-
Maciej Fijalkowski
-
Mike Kaplinskiy