Complexity of standard Python data structures (Re: Speed of string += string)

Marcus Alanen marcus at infa.abo.fi
Mon Apr 14 09:47:22 EDT 2003


On Mon, 14 Apr 2003 10:53:34 GMT, Alex Martelli <aleax at aleax.it> wrote:
>Marcus Alanen wrote:
>> Indeed, that was the first thing I wondered about when I started
>> learning Python.
>Interesting!  What other languages (besides C++ for a subset of its
>standard libraries) give O() specifications for their operations?
>In other words, what other languages did you learn, before Python,
>that didn't leave you wondering about such performance issues?

It must've been a combination of issues:

1) Coming from a C/C++ background, I want to use data structures
efficiently.  This belongs to the "premature optimization" group. So
sue me.

2) The project I was doing could with time lead to handling quite big
sets of data, so algorithmic efficiency was a concern.

3) Python's "list" had on insert() routine, but also a __getitem__
routine. Both probably couldn't be O(1), so I got a bit confused.
I didn't want to choose a "slow" method on purpose.

4) And hey, I did want to learn the language properly.

Number 3 was quite irritating, since C++ "list" is a double-linked
list, while Python "list" is what is called a "vector" in C++.

Probably not very good excuses (except number 4), but I'm just telling
what I felt at the time. On the other hand, I don't understand why the
Python documentation simply can't say "speed O(N)", "speed amortized
O(1)" for the various methods.

>> Others answered this very thoroughly, I'd just like to emphasize the
>> linearity of keys() --- most often you can manage with .iterkeys() or
>> .itervalues() just fine, and it's a lot faster.

>As for the performance issues, I wouldn't be so sure.  E.g.:

Sorry, I really didn't explain that very well. In several cases I have
had, I don't need to traverse the whole array of dict(), so iterkeys()
does feel like saving some memory, and possibly speed. Naturally, both
functions in general visit all elements, and are linear in speed.

By the way, I tested a similar program with 10 million elements a >15%
speed gain with iterkeys():

d = {}
for i in xrange(10000000): d[i] = i

import time

for i in xrange(3):
        st = time.time()
        for i in d.iterkeys():
            pass
        et = time.time()
        print et - st

        st = time.time()
        for i in d.keys():
            pass
        et = time.time()
        print et - st

Definitely not a significant performance difference, though.

>That's most definitely NOT a significant performance difference, 
>of course, but calling iteration on iterkeys "a lot faster" than 
>iteration on keys would appear to be substantially misleading, at 
>least without a lot of qualifications and hedging!-)

True. Sorry for the confusion.

-- 
Marcus Alanen
maalanen at abo.fi




More information about the Python-list mailing list