<span></span>Thanks again for the comment, not sure I will implement all of it but I will separate the "if not row" The files have some extraneous blank rows in the middle that I need to be sure not to import as blank rows. <div>
I am actually having trouble with this filling my sys memory, I posted a separate question "Why is this filling my sys memory" or something like that is the subject.</div><div>I might be that my 1yr old son has been trying to help for the last hour. It is very distracting.</div>
<div><br></div><div><div name="mailplane_signature"> <table><tbody><tr><td width="80">
<img src="http://www.gravatar.com/avatar/226e40fdc55d4597a46279296a616384.png">
</td><td width="10"></td><td width="127" align="center">
<div style="padding-right: 5px; padding-left: 5px;
font-size: 11px; padding-bottom: 5px; color: #666666;
padding-top: 5px">
<p><strong>Vincent Davis<br>
720-301-3003
</strong><br>
<a href="mailto:vincent@vincentdavis.net">vincent@vincentdavis.net</a> </p>
<div style="font-size: 10px">
<a href="http://vincentdavis.net">my blog</a> |
<a href="http://www.linkedin.com/in/vincentdavis">LinkedIn</a></div></div></td></tr><tr></tr></tbody></table></div><br><br><div class="gmail_quote">On Sat, Feb 20, 2010 at 6:18 PM, Jonathan Gardner <span dir="ltr"><<a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="gmail_quote"><div class="im">On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis <span dir="ltr"><<a href="mailto:vincent@vincentdavis.net" target="_blank">vincent@vincentdavis.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">
<span></span><div>Thanks for the help, this is considerably faster and easier to read (see below). I changed it to avoid the "break" and I think it makes it easy to understand. I am checking the conditions each time slows it but it is worth it to me at this time. </div>
<br></blockquote></div><div><br>It seems you are beginning to understand that programmer time is more valuable than machine time. Congratulations.<br> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">
<div><div></div><div>def read_data_file(filename):</div><div> reader = csv.reader(open(filename, "U"),delimiter='\t')</div><div><br></div></div><div> data = []</div>
<div> mask = []</div><div> outliers = []</div><div> modified = []</div><div> </div><div> data_append = data.append</div><div> mask_append = mask.append</div><div> outliers_append = outliers.append</div>
<div> modified_append = modified.append</div><div> <br></div></blockquote></div><div><br>I know some people do this to speed things up. Really, I don't think it's necessary or wise to do so.<br> </div><div class="im">
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">
<div></div><div> maskcount = 0</div><div> outliercount = 0</div><div> modifiedcount = 0</div><div><div> </div><div> for row in reader:</div><div> if '[MASKS]' in row:</div>
</div><div> maskcount += 1</div><div> if '[OUTLIERS]' in row:</div><div> outliercount += 1</div><div> if '[MODIFIED]' in row:</div><div> modifiedcount += 1</div>
<div>
if not any((maskcount, outliercount, modifiedcount, not row)):</div><div> data_append(row)</div><div> elif not any((outliercount, modifiedcount, not row)):</div><div> mask_append(row)</div>
<div> elif not any((modifiedcount, not row)):</div><div> outliers_append(row)</div><div> else:</div><div> if row: modified_append(row)</div><div> </div></blockquote></div><div>
<br>
Just playing with the logic here:<br><br>1. Notice that if "not row" is True, nothing happens? Pull it out explicitly.<br><br>2. Notice how it switches from mode to mode? Program it more explicitly.<br><br>Here's my suggestion:<br>
<br>def parse_masks(reader):<br> for row in reader:<br> if not row: continue<br> elif '[OUTLIERS]' in row: parse_outliers(reader)<br>
elif '[MODIFIED]' in row: parse_modified(reader)<br>
masks.append(row)<br><br>def parse_outliers(reader):<br>
for row in reader:<br>
if not row: continue<br> elif '[MODIFIED]' in row: parse_modified(reader)<br>
outliers.append(row)<br>
<br>def parse_modified(reader):<br>
for row in reader:<br>
if not row: continue<br> modified.append(row)<br>
<br>for row in reader:<br>
if not row: continue<br>
elif '[MASKS]' in row: parse_masks(reader)<br>
elif '[OUTLIERS]' in row: parse_outliers(reader)<br>
elif '[MODIFIED]' in row: parse_modified(reader)<br>
else: data.append(row)<br>
<br>Since there is global state involved, you may want to save yourself some trouble in the future and put the above in a class where separate parsers can be kept separate.<br><br>It looks like your program is turning into a regular old parser. Any format that is a little more than trivial to parse will need a real parser like the above.<br>
<br></div></div>-- <br><div><div></div><div class="h5">Jonathan Gardner<br><a href="mailto:jgardner@jonathangardner.net" target="_blank">jgardner@jonathangardner.net</a><br>
</div></div></blockquote></div><br></div>