speed question, reading csv using takewhile() and dropwhile()
Jonathan Gardner
jgardner at jonathangardner.net
Sat Feb 20 20:18:24 EST 2010
On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis <vincent at vincentdavis.net>wrote:
> Thanks for the help, this is considerably faster and easier to read (see
> below). I changed it to avoid the "break" and I think it makes it easy to
> understand. I am checking the conditions each time slows it but it is worth
> it to me at this time.
>
>
It seems you are beginning to understand that programmer time is more
valuable than machine time. Congratulations.
> def read_data_file(filename):
> reader = csv.reader(open(filename, "U"),delimiter='\t')
>
> data = []
> mask = []
> outliers = []
> modified = []
>
> data_append = data.append
> mask_append = mask.append
> outliers_append = outliers.append
> modified_append = modified.append
>
>
I know some people do this to speed things up. Really, I don't think it's
necessary or wise to do so.
> maskcount = 0
> outliercount = 0
> modifiedcount = 0
>
> for row in reader:
> if '[MASKS]' in row:
> maskcount += 1
> if '[OUTLIERS]' in row:
> outliercount += 1
> if '[MODIFIED]' in row:
> modifiedcount += 1
> if not any((maskcount, outliercount, modifiedcount, not row)):
> data_append(row)
> elif not any((outliercount, modifiedcount, not row)):
> mask_append(row)
> elif not any((modifiedcount, not row)):
> outliers_append(row)
> else:
> if row: modified_append(row)
>
>
Just playing with the logic here:
1. Notice that if "not row" is True, nothing happens? Pull it out
explicitly.
2. Notice how it switches from mode to mode? Program it more explicitly.
Here's my suggestion:
def parse_masks(reader):
for row in reader:
if not row: continue
elif '[OUTLIERS]' in row: parse_outliers(reader)
elif '[MODIFIED]' in row: parse_modified(reader)
masks.append(row)
def parse_outliers(reader):
for row in reader:
if not row: continue
elif '[MODIFIED]' in row: parse_modified(reader)
outliers.append(row)
def parse_modified(reader):
for row in reader:
if not row: continue
modified.append(row)
for row in reader:
if not row: continue
elif '[MASKS]' in row: parse_masks(reader)
elif '[OUTLIERS]' in row: parse_outliers(reader)
elif '[MODIFIED]' in row: parse_modified(reader)
else: data.append(row)
Since there is global state involved, you may want to save yourself some
trouble in the future and put the above in a class where separate parsers
can be kept separate.
It looks like your program is turning into a regular old parser. Any format
that is a little more than trivial to parse will need a real parser like the
above.
--
Jonathan Gardner
jgardner at jonathangardner.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100220/66c36081/attachment-0001.html>
More information about the Python-list
mailing list