list comprehension help
Alex Martelli
aleax at mac.com
Sun Mar 18 19:52:38 EDT 2007
George Sakkis <george.sakkis at gmail.com> wrote:
...
> > Unless each line is huge, how exactly you split it to get the first and
> > last blank-separated word is not going to matter much.
> >
> > Still, you should at least avoid repeating the splitting twice, that's
> > pretty obviously sheer waste: so, change that loop body to:
> >
> > words = line.split(' ')
> > db[words[0]] = words[-1]
> >
> > If some lines are huge, splitting them entirely may be far more work
> > than you need. In this case, you may do two partial splits instead, one
> > direct and one reverse:
> >
> > first_word = line.split(' ', 1)[0]
> > last_word = line.rsplit(' ', 1][-1]
> > db[first_word] = last_word
>
> I'd guess the following is in theory faster, though it might not make
> a measurable difference:
>
> first_word = line[:line.index(' ')]
> last_word = line[line.rindex(' ')+1:]
> db[first_word] = last_word
If the lines are huge, the difference is quite measurable:
brain:~ alex$ python -mtimeit -s"line='ciao '*999" "first=line.split('
',1)[0]; line=line.rstrip(); second=line.rsplit(' ',1)[-1]"
100000 loops, best of 3: 3.95 usec per loop
brain:~ alex$ python -mtimeit -s"line='ciao '*999"
"first=line[:line.index(' ')]; line=line.rstrip();
second=line[line.rindex(' ')+1:]"
1000000 loops, best of 3: 1.62 usec per loop
brain:~ alex$
So, if the 4GB file was made up, say, of 859853 such lines, using the
index/rindex approach might save a couple of seconds overall.
The lack of ,1 in the split/rsplit calls (i.e., essentially, the code
originally posted) brings the snippet time to 226 microseconds; here,
the speedup might therefore be of a couple HUNDRED seconds in all.
Alex
More information about the Python-list
mailing list