<div class="gmail_quote">On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis <span dir="ltr"><<a href="mailto:vincent@vincentdavis.net">vincent@vincentdavis.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<span></span><div>Thanks for the help, this is considerably faster and easier to read (see below). I changed it to avoid the "break" and I think it makes it easy to understand. I am checking the conditions each time slows it but it is worth it to me at this time. </div>
<br></blockquote><div><br>It seems you are beginning to understand that programmer time is more valuable than machine time. Congratulations.<br> </div><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im"><div></div><div>def read_data_file(filename):</div><div> reader = csv.reader(open(filename, "U"),delimiter='\t')</div><div><br></div></div><div> data = []</div>
<div> mask = []</div><div> outliers = []</div><div> modified = []</div><div> </div><div> data_append = data.append</div><div> mask_append = mask.append</div><div> outliers_append = outliers.append</div>
<div> modified_append = modified.append</div><div> <br></div></blockquote><div><br>I know some people do this to speed things up. Really, I don't think it's necessary or wise to do so.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div></div><div> maskcount = 0</div><div> outliercount = 0</div><div> modifiedcount = 0</div><div class="im"><div> </div><div> for row in reader:</div><div> if '[MASKS]' in row:</div>
</div><div> maskcount += 1</div><div> if '[OUTLIERS]' in row:</div><div> outliercount += 1</div><div> if '[MODIFIED]' in row:</div><div> modifiedcount += 1</div>
<div>
if not any((maskcount, outliercount, modifiedcount, not row)):</div><div> data_append(row)</div><div> elif not any((outliercount, modifiedcount, not row)):</div><div> mask_append(row)</div>
<div> elif not any((modifiedcount, not row)):</div><div> outliers_append(row)</div><div> else:</div><div> if row: modified_append(row)</div><div> </div></blockquote><div><br>
Just playing with the logic here:<br><br>1. Notice that if "not row" is True, nothing happens? Pull it out explicitly.<br><br>2. Notice how it switches from mode to mode? Program it more explicitly.<br><br>Here's my suggestion:<br>
<br>def parse_masks(reader):<br> for row in reader:<br> if not row: continue<br> elif '[OUTLIERS]' in row: parse_outliers(reader)<br>
elif '[MODIFIED]' in row: parse_modified(reader)<br>
masks.append(row)<br><br>def parse_outliers(reader):<br>
for row in reader:<br>
if not row: continue<br> elif '[MODIFIED]' in row: parse_modified(reader)<br>
outliers.append(row)<br>
<br>def parse_modified(reader):<br>
for row in reader:<br>
if not row: continue<br> modified.append(row)<br>
<br>for row in reader:<br>
if not row: continue<br>
elif '[MASKS]' in row: parse_masks(reader)<br>
elif '[OUTLIERS]' in row: parse_outliers(reader)<br>
elif '[MODIFIED]' in row: parse_modified(reader)<br>
else: data.append(row)<br>
<br>Since there is global state involved, you may want to save yourself some trouble in the future and put the above in a class where separate parsers can be kept separate.<br><br>It looks like your program is turning into a regular old parser. Any format that is a little more than trivial to parse will need a real parser like the above.<br>
<br></div></div>-- <br>Jonathan Gardner<br><a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a><br>