How to remove subset from a file efficiently?
Raymond Hettinger
python at rcn.com
Sat Jan 14 07:38:01 EST 2006
> > b = set(file('/home/sajid/python/wip/stc/2/CBR0000333'))
> >
> > file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP0000333')))
> >
> > --
> > $ time ./cleanup_ray.py
> >
> > real 0m5.451s
> > user 0m4.496s
> > sys 0m0.428s
> >
> > (-: Damn! That saves a bit more time! Bravo!
> >
[bonono at gmail.com]
> Have you tried the explicit loop variant with psyco ? My experience is
> that psyco is pretty good at optimizing for loop which usually results
> in faster code than even built-in map/filter variant.
>
> Though it would just be 1 or 2 sec difference(given what you already
> have) so may not be important but could be fun.
The code is pretty tight and is now most likely I/O bound. If so,
further speed-ups will be hard to come by (even with psyco). The four
principal steps of reading, membership testing, filtering, and writing
are all C coded methods which are directly linked together with no
interpreter loop overhead or method lookups. Hard to beat.
More information about the Python-list
mailing list