<div dir="ltr"><div>Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was struck by increased guilt over a program I am developing. I spoke with him in the hallway about a good approach to improving performance while keeping code nice looking.</div>
<div><br></div><div>The basic idea in my program is that I have a large(ish) list of tokens that I generate in parsing a special language, and I frequently take slices from this list--and often enough, slices out of those slices. In some cases, the most straightforward approach in code is a slice-to-end, e.g.:</div>
<div><br></div><div> start = find_thing(tokens)</div><div> construct_in = tokens[start:]</div><div><br></div><div>Obviously, this winds up doing a lot of copying (of object references, not of actual data, but still). Often the above step is followed by something like:</div>
<div><br></div><div> end = find_end(construct_in)</div><div> construct = construct_in[:end]</div><div><br></div><div>This second step isn't bad in my case since the actual construct will be dozens of tokens, not thousands or millions, and once I find it I want to keep it around and process it further. </div>
<div><br></div><div>I realize, of course, that I could program my 'find_end()' differently so that it took a signature more like 'find_end(tokens, start=start)'. But with recursion and some other things I do, this becomes inelegant.</div>
<div><br></div><div>What I'd really like is a "ListView" that acts something like NumPy's non-copying slices. However numpy, of course, only deals with arrays and matrices of uniform numeric types. I want a non-copying "slice" of a list of generic Python objects. Moreover, I believe that such a type is useful enough to be worth including in the collections module generally.</div>
<div><br></div><div>As an initial implementation, I created the below. In the module self-tests, the performance increase is about 100x, but the particular ad-hoc benchmark I wrote isn't necessarily well-representative of all use-cases.</div>
<div><br></div><div><div><font face="courier new, monospace">% python3 ListView.py</font></div><div><font face="courier new, monospace"> A bunch of slices from list: 1.29 seconds</font></div><div><font face="courier new, monospace">A bunch of slices from DummyListView: 1.19 seconds</font></div>
<div><font face="courier new, monospace"> A bunch of slices from ListView: 0.01 seconds</font></div></div><div>-------------------------------------------------------------------</div><div><font face="courier new, monospace">### ListView.py ###</font></div>
<div><div><font face="courier new, monospace">import sys</font></div><div><font face="courier new, monospace">from time import time</font></div><div><font face="courier new, monospace">from random import randint, random</font></div>
<div><font face="courier new, monospace">from collections import Sequence</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">class DummyListView(Sequence):</font></div>
<div><font face="courier new, monospace"> def __init__(self, l):</font></div><div><font face="courier new, monospace"> self.list = l</font></div><div><font face="courier new, monospace"> def __len__(self):</font></div>
<div><font face="courier new, monospace"> return len(self.list)</font></div><div><font face="courier new, monospace"> def __getitem__(self, i):</font></div><div><font face="courier new, monospace"> return self.list[i]</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">class ListView(Sequence):</font></div><div><font face="courier new, monospace"> def __init__(self, seq, start=0, stop=None):</font></div>
<div><font face="courier new, monospace"> if hasattr(seq, '__getitem__'):</font></div><div><font face="courier new, monospace"> self.list = seq</font></div><div><font face="courier new, monospace"> else:</font></div>
<div><font face="courier new, monospace"> self.list = list(seq)</font></div><div><font face="courier new, monospace"> self.start = start</font></div><div><font face="courier new, monospace"> self.stop = len(self.list) if stop is None else stop</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"> def __len__(self):</font></div><div><font face="courier new, monospace"> return self.stop - self.start</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"> def __getitem__(self, i):</font></div><div><font face="courier new, monospace"> if isinstance(i, slice):</font></div>
<div><font face="courier new, monospace"> start = self.start if i.start is None else self.start+i.start</font></div><div><font face="courier new, monospace"> if i.stop is None:</font></div><div><font face="courier new, monospace"> stop = self.stop</font></div>
<div><font face="courier new, monospace"> else:</font></div><div><font face="courier new, monospace"> stop = self.start + i.stop</font></div><div><font face="courier new, monospace"> return ListView(self.list, start, stop)</font></div>
<div><font face="courier new, monospace"> else:</font></div><div><font face="courier new, monospace"> val = self.list[i+self.start]</font></div><div><font face="courier new, monospace"> if i < 0:</font></div>
<div><font face="courier new, monospace"> return val</font></div><div><font face="courier new, monospace"> elif not self.start <= i+self.start < self.stop:</font></div><div><font face="courier new, monospace"> raise IndexError("View of sequence [%d:%d], index %d" % (</font></div>
<div><font face="courier new, monospace"> self.start, self.stop, i))</font></div><div><font face="courier new, monospace"> return val</font></div><div><font face="courier new, monospace"><br>
</font></div><div><font face="courier new, monospace"> def __str__(self):</font></div><div><font face="courier new, monospace"> return "ListView of %d item list, slice [%d:%d]" % (</font></div><div><font face="courier new, monospace"> len(self.list), self.start, self.stop)</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"> def __repr__(self):</font></div><div><font face="courier new, monospace"> return "ListView(%s)" % self.list[self.start:self.stop]</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"> def to_list(self):</font></div><div><font face="courier new, monospace"> return list(self.list[self.start:self.stop])</font></div>
</div><div><font face="courier new, monospace"><br></font></div><div><div><font face="courier new, monospace">class Thing(object):</font></div><div><font face="courier new, monospace"> def __init__(self, x):</font></div>
<div><font face="courier new, monospace"> self.x = x</font></div><div><font face="courier new, monospace"> def __repr__(self):</font></div><div><font face="courier new, monospace"> return "Thing(%f)" % self.x</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">if __name__ == '__main__':</font></div><div><font face="courier new, monospace"> NUM = 100000</font></div><div>
<font face="courier new, monospace"> things = [Thing(random()) for _ in range(NUM)]</font></div><div><font face="courier new, monospace"> slices = [sorted((randint(0, NUM-1), randint(0, NUM-1)))</font></div><div><font face="courier new, monospace"> for _ in range(100)]</font></div>
<div><font face="courier new, monospace"> offset = randint(0, 100)</font></div><div><font face="courier new, monospace"> for name, cls in (("list", list),</font></div><div><font face="courier new, monospace"> ("DummyListView", DummyListView),</font></div>
<div><font face="courier new, monospace"> ("ListView",ListView)):</font></div><div><font face="courier new, monospace"> begin = time()</font></div><div><font face="courier new, monospace"> s = "A bunch of slices from %s: " % name</font></div>
<div><font face="courier new, monospace"> print(s.rjust(38), end='')</font></div><div><font face="courier new, monospace"> sys.stdout.flush()</font></div><div><font face="courier new, monospace"> l = cls(things)</font></div>
<div><font face="courier new, monospace"> for i in range(8):</font></div><div><font face="courier new, monospace"> for start, stop in slices:</font></div><div><font face="courier new, monospace"> sl = l[start:stop]</font></div>
<div><font face="courier new, monospace"> size = stop-start</font></div><div><font face="courier new, monospace"> for i in range(3):</font></div><div><font face="courier new, monospace"> subslice1 = sl[:offset]</font></div>
<div><font face="courier new, monospace"> subslice2 = sl[offset:]</font></div><div><font face="courier new, monospace"> print("%0.2f seconds" % (time()-begin))</font></div></div><br clear="all">
<div><br></div>-- <br>Keeping medicines from the bloodstreams of the sick; food <br>from the bellies of the hungry; books from the hands of the <br>uneducated; technology from the underdeveloped; and putting <br>advocates of freedom in prisons. Intellectual property is<br>
to the 21st century what the slave trade was to the 16th.<br>
</div>