[Python-ideas] Add a datatype collections.ListView
David Mertz
mertz at gnosis.cx
Thu Apr 17 17:49:45 CEST 2014
Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was
struck by increased guilt over a program I am developing. I spoke with him
in the hallway about a good approach to improving performance while keeping
code nice looking.
The basic idea in my program is that I have a large(ish) list of tokens
that I generate in parsing a special language, and I frequently take slices
from this list--and often enough, slices out of those slices. In some
cases, the most straightforward approach in code is a slice-to-end, e.g.:
start = find_thing(tokens)
construct_in = tokens[start:]
Obviously, this winds up doing a lot of copying (of object references, not
of actual data, but still). Often the above step is followed by something
like:
end = find_end(construct_in)
construct = construct_in[:end]
This second step isn't bad in my case since the actual construct will be
dozens of tokens, not thousands or millions, and once I find it I want to
keep it around and process it further.
I realize, of course, that I could program my 'find_end()' differently so
that it took a signature more like 'find_end(tokens, start=start)'. But
with recursion and some other things I do, this becomes inelegant.
What I'd really like is a "ListView" that acts something like NumPy's
non-copying slices. However numpy, of course, only deals with arrays and
matrices of uniform numeric types. I want a non-copying "slice" of a list
of generic Python objects. Moreover, I believe that such a type is useful
enough to be worth including in the collections module generally.
As an initial implementation, I created the below. In the module
self-tests, the performance increase is about 100x, but the particular
ad-hoc benchmark I wrote isn't necessarily well-representative of all
use-cases.
% python3 ListView.py
A bunch of slices from list: 1.29 seconds
A bunch of slices from DummyListView: 1.19 seconds
A bunch of slices from ListView: 0.01 seconds
-------------------------------------------------------------------
### ListView.py ###
import sys
from time import time
from random import randint, random
from collections import Sequence
class DummyListView(Sequence):
def __init__(self, l):
self.list = l
def __len__(self):
return len(self.list)
def __getitem__(self, i):
return self.list[i]
class ListView(Sequence):
def __init__(self, seq, start=0, stop=None):
if hasattr(seq, '__getitem__'):
self.list = seq
else:
self.list = list(seq)
self.start = start
self.stop = len(self.list) if stop is None else stop
def __len__(self):
return self.stop - self.start
def __getitem__(self, i):
if isinstance(i, slice):
start = self.start if i.start is None else self.start+i.start
if i.stop is None:
stop = self.stop
else:
stop = self.start + i.stop
return ListView(self.list, start, stop)
else:
val = self.list[i+self.start]
if i < 0:
return val
elif not self.start <= i+self.start < self.stop:
raise IndexError("View of sequence [%d:%d], index %d" % (
self.start, self.stop, i))
return val
def __str__(self):
return "ListView of %d item list, slice [%d:%d]" % (
len(self.list), self.start, self.stop)
def __repr__(self):
return "ListView(%s)" % self.list[self.start:self.stop]
def to_list(self):
return list(self.list[self.start:self.stop])
class Thing(object):
def __init__(self, x):
self.x = x
def __repr__(self):
return "Thing(%f)" % self.x
if __name__ == '__main__':
NUM = 100000
things = [Thing(random()) for _ in range(NUM)]
slices = [sorted((randint(0, NUM-1), randint(0, NUM-1)))
for _ in range(100)]
offset = randint(0, 100)
for name, cls in (("list", list),
("DummyListView", DummyListView),
("ListView",ListView)):
begin = time()
s = "A bunch of slices from %s: " % name
print(s.rjust(38), end='')
sys.stdout.flush()
l = cls(things)
for i in range(8):
for start, stop in slices:
sl = l[start:stop]
size = stop-start
for i in range(3):
subslice1 = sl[:offset]
subslice2 = sl[offset:]
print("%0.2f seconds" % (time()-begin))
--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140417/ee9462e1/attachment.html>
More information about the Python-ideas
mailing list