Generator Expressions and CSV

MRAB python at mrabarnett.plus.com
Fri Jul 17 14:49:32 EDT 2009


Zaki wrote:
> Hey all,
> 
> I'm really new to Python and this may seem like a really dumb
> question, but basically, I wrote a script to do the following, however
> the processing time/memory usage is not what I'd like it to be. Any
> suggestions?
> 
> 
> Outline:
> 1. Read tab delim files from a directory, files are of 3 types:
> install, update, and q. All 3 types contain ID values that are the
> only part of interest.
> 2. Using set() and set.add(), generate a list of unique IDs from
> install and update files.
> 3. Using the set created in (2), check the q files to see if there are
> matches for IDs. Keep all matches, and add any non matches (which only
> occur once in the q file) to a queue of lines to be removed from teh q
> files.
> 4. Remove the lines in the q for each file. (I haven't quite written
> the code for this, but I was going to implement this using csv.writer
> and rewriting all the lines in the file except for the ones in the
> removal queue).
> 
> Now, I've tried running this and it takes much longer than I'd like. I
> was wondering if there might be a better way to do things (I thought
> generator expressions might be a good way to attack this problem, as
> you could generate the set, and then check to see if there's a match,
> and write each line that way).
> 
Why are you checking and removing lines in 2 steps? Why not copy the
matching lines to a new q file and then replace the old file with the
new one (or, maybe, delete the new q file if no lines were removed)?



More information about the Python-list mailing list