How to remove subset from a file efficiently?
python at rcn.com
Sat Jan 14 13:38:01 CET 2006
> > b = set(file('/home/sajid/python/wip/stc/2/CBR0000333'))
> > file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP0000333')))
> > --
> > $ time ./cleanup_ray.py
> > real 0m5.451s
> > user 0m4.496s
> > sys 0m0.428s
> > (-: Damn! That saves a bit more time! Bravo!
[bonono at gmail.com]
> Have you tried the explicit loop variant with psyco ? My experience is
> that psyco is pretty good at optimizing for loop which usually results
> in faster code than even built-in map/filter variant.
> Though it would just be 1 or 2 sec difference(given what you already
> have) so may not be important but could be fun.
The code is pretty tight and is now most likely I/O bound. If so,
further speed-ups will be hard to come by (even with psyco). The four
principal steps of reading, membership testing, filtering, and writing
are all C coded methods which are directly linked together with no
interpreter loop overhead or method lookups. Hard to beat.
More information about the Python-list