Generator Expressions and CSV

Zaki zaki.rahaman at gmail.com
Fri Jul 17 13:58:20 EDT 2009


Hey all,

I'm really new to Python and this may seem like a really dumb
question, but basically, I wrote a script to do the following, however
the processing time/memory usage is not what I'd like it to be. Any
suggestions?


Outline:
1. Read tab delim files from a directory, files are of 3 types:
install, update, and q. All 3 types contain ID values that are the
only part of interest.
2. Using set() and set.add(), generate a list of unique IDs from
install and update files.
3. Using the set created in (2), check the q files to see if there are
matches for IDs. Keep all matches, and add any non matches (which only
occur once in the q file) to a queue of lines to be removed from teh q
files.
4. Remove the lines in the q for each file. (I haven't quite written
the code for this, but I was going to implement this using csv.writer
and rewriting all the lines in the file except for the ones in the
removal queue).

Now, I've tried running this and it takes much longer than I'd like. I
was wondering if there might be a better way to do things (I thought
generator expressions might be a good way to attack this problem, as
you could generate the set, and then check to see if there's a match,
and write each line that way).





More information about the Python-list mailing list