[Tutor] binary trees/ linked lists

Magnus Lycka magnus@thinkware.se
Thu, 03 Oct 2002 21:54:12 +0200


At 16:34 2002-10-03 +0100, alan.gauld@bt.com wrote:
>OK, I just meant that everywhere you use a linked
>list you could use a Python list! ie you can grow
>them dynamically, insert in the middle, find
>elements, traverse in either direction etc.
>
>The internal implementation is something I encourage
>people not to think about. If I want to do that I
>can work in C++!! :-)

Agreed!

The problem occurs if you have big lists and high
performance requirements. As I mentioned, they have
different sweet spots and bottle necks. In 99% of
the cases it doesn't matter, but sometimes there is
a huge difference in how Python lists behave compared
to a linked list. E.g.

 >>> from time import clock as t
 >>> def measure(f, n):
...     start =3D t()
...     for i in range(n): f()
...     return t() - start
...
 >>> def addFirst():
...     l.insert(0,0)
...
 >>> def addLast():
...     l.append(0)
...
 >>> l =3D range(1000)
 >>> measure(addFirst, 100)/measure(addLast, 100)
2.4258241758658317
 >>> l =3D range(10000)
 >>> measure(addFirst, 100)/measure(addLast, 100)
89.448554134709681
 >>> l =3D range(100000)
 >>> measure(addFirst, 100)/measure(addLast, 100)
469.85452162937037
 >>> l =3D range(1000000)
 >>> measure(addFirst, 100)/measure(addLast, 100)
4732.36800513662

For a real linked list, there would not be any
difference if we added an element in the beginning
or the end of the list.

On the other hand, it would take proportional to
the length of the list to find the right place
to insert data if we for instance want to insert
data in sorted order.

Then a BTree would be much better--and there are
(as I wrote) BTrees for Python in ZODB.

The algorithms themselves are really rather
uninteresting in real life. You never really
need a linked list unless someone tells you to
show a linked list.

You might need a stack, or a queue, or a priority
queue or a set. These are data collections with
various practical qualities. A linked list is really
just a means to achieving something like that.

Since deleting and adding in the "bottom" of a
python list is equally bad, a queue implemented
with a python list, while trivial to implement,
will scale badly. A linked list would be a better
basis for implementing a queue.

But who cares? Python has a queue module in its
standard lib...

import Queue
help(Queue)

While we'll probably make better software from
the start the more we know about technical details
and libraries, python programs are typically simple
to modify.

So, by all means, start out with a list whether
it's the optimal data type or not. Chances are good
that the bottleneck will be somewhere else, and if
you made your application more complex to avoid a
non-existent performance problem, you have just
wasted time and made software maintenance more
costly.

The trivial queue is built like this:

 >>> class Q(list):
...     def push(self, x):
...             self.insert(0, x)
...
 >>> q =3D Q()
 >>> q.push(3)
 >>> q.push('A')
 >>> q.push('Z')
 >>> print q
['Z', 'A', 3]
 >>> q.pop()
3
 >>> q.pop()
'A'
 >>> q.pop()
'Z'
 >>> q.pop()
Traceback (most recent call last):
   File "<interactive input>", line 1, in ?
IndexError: pop from empty list

This will work well for small queues/FIFOs. But as
you noticed in the benchmarking above, it will start
to get slow with many thousands of elements...


--=20
Magnus Lyck=E5, Thinkware AB
=C4lvans v=E4g 99, SE-907 50 UME=C5
tel: 070-582 80 65, fax: 070-612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se