<span></span>Thanks again for the comment, not sure I will implement all of it but I will separate the "if not row" The files have some extraneous blank rows in the middle that I need to be sure not to import as blank rows. <div>

I am actually having trouble with this filling my sys memory, I posted a separate question "Why is this filling my sys memory" or something like that is the subject.</div><div>I might be that my 1yr old son has been trying to help for the last hour. It is very distracting.</div>

<div><br></div><div><div name="mailplane_signature"> <table><tbody><tr><td width="80">

<img src="http://www.gravatar.com/avatar/226e40fdc55d4597a46279296a616384.png">

</td><td width="10"></td><td width="127" align="center">

<div style="padding-right: 5px; padding-left: 5px;

font-size: 11px; padding-bottom: 5px; color: #666666;

padding-top: 5px">

  <p><strong>Vincent Davis<br>

    720-301-3003

  </strong><br>

    
    <a href="mailto:vincent@vincentdavis.net">vincent@vincentdavis.net</a>  </p>

<div style="font-size: 10px">

  <a href="http://vincentdavis.net">my blog</a> |

  <a href="http://www.linkedin.com/in/vincentdavis">LinkedIn</a></div></div></td></tr><tr></tr></tbody></table></div><br><br><div class="gmail_quote">On Sat, Feb 20, 2010 at 6:18 PM, Jonathan Gardner <span dir="ltr"><<a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="gmail_quote"><div class="im">On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis <span dir="ltr"><<a href="mailto:vincent@vincentdavis.net" target="_blank">vincent@vincentdavis.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">

<span></span><div>Thanks for the help, this is considerably faster and easier to read (see below). I changed it to avoid the "break" and I think it makes it easy to understand. I am checking the conditions each time slows it but it is worth it to me at this time. </div>


<br></blockquote></div><div><br>It seems you are beginning to understand that programmer time is more valuable than machine time. Congratulations.<br> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">


<div><div></div><div>def read_data_file(filename):</div><div>    reader = csv.reader(open(filename, "U"),delimiter='\t')</div><div><br></div></div><div>    data = []</div>

<div>    mask = []</div><div>    outliers = []</div><div>    modified = []</div><div>    </div><div>    data_append = data.append</div><div>    mask_append = mask.append</div><div>    outliers_append = outliers.append</div>


<div>    modified_append = modified.append</div><div>   <br></div></blockquote></div><div><br>I know some people do this to speed things up. Really, I don't think it's necessary or wise to do so.<br> </div><div class="im">

<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">

<div></div><div>    maskcount = 0</div><div>    outliercount = 0</div><div>    modifiedcount = 0</div><div><div>    </div><div>    for row in reader:</div><div>        if '[MASKS]' in row:</div>

</div><div>            maskcount += 1</div><div>        if '[OUTLIERS]' in row:</div><div>            outliercount += 1</div><div>        if '[MODIFIED]' in row:</div><div>            modifiedcount += 1</div>


<div>

        if not any((maskcount, outliercount, modifiedcount, not row)):</div><div>            data_append(row)</div><div>        elif not any((outliercount, modifiedcount, not row)):</div><div>            mask_append(row)</div>


<div>        elif not any((modifiedcount, not row)):</div><div>            outliers_append(row)</div><div>        else:</div><div>            if row: modified_append(row)</div><div>            </div></blockquote></div><div>

<br>

Just playing with the logic here:<br><br>1. Notice that if "not row" is True, nothing happens? Pull it out explicitly.<br><br>2. Notice how it switches from mode to mode? Program it more explicitly.<br><br>Here's my suggestion:<br>


<br>def parse_masks(reader):<br>    for row in reader:<br>        if not row: continue<br>        elif '[OUTLIERS]' in row: parse_outliers(reader)<br>


        elif '[MODIFIED]' in row: parse_modified(reader)<br>

       masks.append(row)<br><br>def parse_outliers(reader):<br>

    for row in reader:<br>

        if not row: continue<br>        elif '[MODIFIED]' in row: parse_modified(reader)<br>


       outliers.append(row)<br>

<br>def parse_modified(reader):<br>


    for row in reader:<br>


        if not row: continue<br>       modified.append(row)<br>

<br>for row in reader:<br>

    if not row: continue<br>

    elif '[MASKS]' in row: parse_masks(reader)<br>

    elif '[OUTLIERS]' in row: parse_outliers(reader)<br>


    elif '[MODIFIED]' in row: parse_modified(reader)<br>

    else: data.append(row)<br>

<br>Since there is global state involved, you may want to save yourself some trouble in the future and put the above in a class where separate parsers can be kept separate.<br><br>It looks like your program is turning into a regular old parser. Any format that is a little more than trivial to parse will need a real parser like the above.<br>


<br></div></div>-- <br><div><div></div><div class="h5">Jonathan Gardner<br><a href="mailto:jgardner@jonathangardner.net" target="_blank">jgardner@jonathangardner.net</a><br>

</div></div></blockquote></div><br></div>