preferring [] or () in list of error codes?

Carl Banks pavlovevidence at gmail.com
Tue Jun 9 12:16:33 EDT 2009


On Jun 9, 4:57 am, samwyse <samw... at gmail.com> wrote:
> On Jun 8, 8:57 pm, samwyse <samw... at gmail.com> wrote:
>
> > I conclude that using constructors is generally a bad idea, since the
> > compiler doesn't know if you're calling the builtin or something with
> > an overloaded name.  I presume that the compiler will eventually
> > optimize the second example to match the last, but both of them use
> > the BUILD_SET opcode.  I expect that this can be expensive for long
> > lists, so I don't think that it's a good idea to use set constants
> > inside loops.  Instead it should be assigned to a global or class
> > variable.
>
> Time to test things!   I'm going to compare three things using Python
> 3.0:
>   X={...}\nS=lambda x: x in X
>   S=lambda x: x in {...}
>   S=lambda x: x in (...)
> where the ... is replaced by lists of integers of various lengths.
> Here's the test bed:
>
> from random import seed, sample
> from timeit import Timer
> maxint = 2**31-1
> values = list(map(lambda n: 2**n-1, range(1,16)))
> def append_numbers(k, setup):
>     seed(1968740928)
>     for i in sample(range(maxint), k):
>         setup.append(str(i))
>         setup.append(',')
> print('==', 'separate set constant')
> for n in values[::2]:
>     print('===', n, 'values')
>     setup = ['X={']
>     append_numbers(n, setup)
>     setup.append('}\nS=lambda x: x in X')
>     t = Timer('S(88632719)', ''.join(setup))
>     print(t.repeat())
> print('==', 'in-line set constant')
> for n in values[:4]:
>     print('===', n, 'values')
>     setup = ['S=lambda x: x in {']
>     append_numbers(n, setup)
>     setup.append('}')
>     t = Timer('S(88632719)', ''.join(setup))
>     print(t.repeat())
> print('==', 'in-line list constant')
> for n in values:
>     print('===', n, 'values')
>     setup = ['S=lambda x: x in (']
>     append_numbers(n, setup)
>     setup.append(')')
>     t = Timer('S(88632719)', ''.join(setup))
>     print(t.repeat())

It looks like you are evaluating the list/set/tuple every pass, and
then, for lists and tuples, always indexing the first item.


> And here are the results.  There's something interesting at the very
> end.
[snip results showing virtually identical performance for list, set,
and tuple]
> You will note that testing against a list constant is just as fast as
> testing against a set.  This was surprising for me; apparently the
> __contains__ operator turns a tuple into a set.

Given the way you wrote the test it this is hardly surprising.

I would expect "item in list" to have comparable execution time to
"item in set" if item is always the first element in list.

Furthermore, the Python compiler appears to be optimizing this
specific case to always use a precompiled set.  Well, almost
always....


> You will also note
> that  performance to fall off drastically for the last set of values.
> I'm not sure what happens there; I guess I'll file a bug report.

Please don't; it's not a bug.  The slowdown is because at sizes above
a certain threshold the Python compiler doesn't try to precompile in-
line lists, sets, and tuples.  The last case was above that limit.


Carl Banks



More information about the Python-list mailing list