Re: [Python-ideas] Add a datatype collections.ListView
On Apr 20, 2014 3:02 AM, "Tal Einat"
FYI your handling of negative indexes is buggy.
It's horrible! I'm not quite sure it can even be called "handling." Take my example only as a prototype of the concept that lets me run the benchmark. That said, doing the negative indices right should just be some arithmetic on self.start and self.stop, not anything too complicated or differently performing.
On Thu, Apr 17, 2014 at 6:49 PM, David Mertz
wrote: Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was
The basic idea in my program is that I have a large(ish) list of tokens
struck by increased guilt over a program I am developing. I spoke with him in the hallway about a good approach to improving performance while keeping code nice looking. that I generate in parsing a special language, and I frequently take slices from this list--and often enough, slices out of those slices. In some cases, the most straightforward approach in code is a slice-to-end, e.g.:
start = find_thing(tokens) construct_in = tokens[start:]
Obviously, this winds up doing a lot of copying (of object references,
not of actual data, but still). Often the above step is followed by something like:
end = find_end(construct_in) construct = construct_in[:end]
This second step isn't bad in my case since the actual construct will be
dozens of tokens, not thousands or millions, and once I find it I want to keep it around and process it further.
I realize, of course, that I could program my 'find_end()' differently
so that it took a signature more like 'find_end(tokens, start=start)'. But with recursion and some other things I do, this becomes inelegant.
What I'd really like is a "ListView" that acts something like NumPy's
non-copying slices. However numpy, of course, only deals with arrays and matrices of uniform numeric types. I want a non-copying "slice" of a list of generic Python objects. Moreover, I believe that such a type is useful enough to be worth including in the collections module generally.
As an initial implementation, I created the below. In the module
self-tests, the performance increase is about 100x, but the particular ad-hoc benchmark I wrote isn't necessarily well-representative of all use-cases.
% python3 ListView.py A bunch of slices from list: 1.29 seconds A bunch of slices from DummyListView: 1.19 seconds A bunch of slices from ListView: 0.01 seconds ------------------------------------------------------------------- ### ListView.py ### import sys from time import time from random import randint, random from collections import Sequence
class DummyListView(Sequence): def __init__(self, l): self.list = l def __len__(self): return len(self.list) def __getitem__(self, i): return self.list[i]
class ListView(Sequence): def __init__(self, seq, start=0, stop=None): if hasattr(seq, '__getitem__'): self.list = seq else: self.list = list(seq) self.start = start self.stop = len(self.list) if stop is None else stop
def __len__(self): return self.stop - self.start
def __getitem__(self, i): if isinstance(i, slice): start = self.start if i.start is None else self.start+i.start if i.stop is None: stop = self.stop else: stop = self.start + i.stop return ListView(self.list, start, stop) else: val = self.list[i+self.start] if i < 0: return val elif not self.start <= i+self.start < self.stop: raise IndexError("View of sequence [%d:%d], index %d" % ( self.start, self.stop, i)) return val
def __str__(self): return "ListView of %d item list, slice [%d:%d]" % ( len(self.list), self.start, self.stop)
def __repr__(self): return "ListView(%s)" % self.list[self.start:self.stop]
def to_list(self): return list(self.list[self.start:self.stop])
class Thing(object): def __init__(self, x): self.x = x def __repr__(self): return "Thing(%f)" % self.x
if __name__ == '__main__': NUM = 100000 things = [Thing(random()) for _ in range(NUM)] slices = [sorted((randint(0, NUM-1), randint(0, NUM-1))) for _ in range(100)] offset = randint(0, 100) for name, cls in (("list", list), ("DummyListView", DummyListView), ("ListView",ListView)): begin = time() s = "A bunch of slices from %s: " % name print(s.rjust(38), end='') sys.stdout.flush() l = cls(things) for i in range(8): for start, stop in slices: sl = l[start:stop] size = stop-start for i in range(3): subslice1 = sl[:offset] subslice2 = sl[offset:] print("%0.2f seconds" % (time()-begin))
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (1)
-
David Mertz