What is wrong in my list comprehension?

Mon Feb 2 16:41:33 EST 2009

On Feb 1, 3:37 am, Peter Otten <__pete... at web.de> wrote:
> Hussein B wrote:
> > Hey,
> > I have a log file that doesn't contain the word "Haskell" at all, I'm
> > just trying to do a little performance comparison:
> > ++++++++++++++
> > from datetime import time, timedelta, datetime
> > start = datetime.now()
> > print start
> > lines = [line for line in file('/media/sda4/Servers/Apache/
> > Tomcat-6.0.14/logs/catalina.out') if line.find('Haskell')]
> > print 'Number of lines contains "Haskell" = ' +  str(len(lines))
> > end = datetime.now()
> > print end
> > ++++++++++++++
> > Well, the script is returning the whole file's lines number !!
> > What is wrong in my logic?
> > Thanks.
>
> """
> find(...)
>     S.find(sub [,start [,end]]) -> int
>
>     Return the lowest index in S where substring sub is found,
>     such that sub is contained within s[start:end].  Optional
>     arguments start and end are interpreted as in slice notation.
>
>     Return -1 on failure.
> """
>
> a.find(b) returns -1 if b is no found. -1 evaluates to True in a boolean
> context.
>
> Use
>
> [line for line in open(...) if line.find("Haskell") != -1]
>
> or, better
>
> [line for line in open(...) if "Haskell" in line]
>
> to get the expected result.
>
> Peter

Or better, group them together in a generator:

sum(line for line in open(...) if "Haskell" in line)

and avoid allocating a new list with every line that contains Haskell
in it.

http://www.python.org/dev/peps/pep-0289/