[Tutor] List comprehensions to search a list--amazing!

Steven D'Aprano steve at pearwood.info
Tue Mar 24 03:21:26 CET 2015


On Mon, Mar 23, 2015 at 08:42:23PM -0500, boB Stepp wrote:
> On Thu, Mar 19, 2015 at 12:10 AM, Dave Angel <davea at davea.name> wrote:
> > The catch to a list comprehension is it has to visit all the elements, while
> > a binary search would visit log-base-2 of them.  So instead of 10000
> > elements, you'd be searching about 14 items.
> 
> I suspected as much, but had not verified this. Nonetheless, this may
> prove sufficiently fast. I will have to test this with my final data
> files. Right now I am using test cases, while I continue to design,
> check, rewrite, etc.
> 
> > For large lists, it'd probably be much quicker to use the bisect module.
> > https://docs.python.org/3.4/library/bisect.html
> 
> Can you give me a ballpark number for "large", where this would start
> making a meaningful difference?

Tell us what you consider a meaningful difference :-)

What counts as "too slow" will depend on:

- what you are doing
- how often you are doing it
- what else you're doing at the same time
- what hardware you have to run it on
- whether you are running the program only once, or repeatedly

etc. E.g. taking 30 seconds to iterate over a billion items in a list is 
insignificant if this is part of a bigger program which takes twenty 
minutes to run, but critical if you are hoping to run the program 
thirty times a second, hundreds of times a day.

But I've never heard anyone complain that a program was too fast :-)

(Actually, that's not quite true. Sometimes if the user is expecting 
a program function to take a while, say "Rebuild database", and it 
actually runs near instantaneously, it is indeed *too fast* because it 
can give the user the idea that the rebuild function isn't working. But 
that's a UI issue, not a speed issue.)

But if you twist my arm, and force me to pluck some round numbers from 
thin air, I would guess:

- for under ten items, a linear search will be insignificantly faster;

- for under a hundred items, there's no meaningful difference;

- for under a million items, binary search will be typically better but 
a linear search still acceptable ("everything is fast for small N");

- for over a million items, linear search will typically be 
unacceptably slow.

For certain definitions of what's acceptable or not :-)



-- 
Steve


More information about the Tutor mailing list