<div class="gmail_quote">On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis <span dir="ltr"><<a href="mailto:vincent@vincentdavis.net">vincent@vincentdavis.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<span></span><div>Thanks for the help, this is considerably faster and easier to read (see below). I changed it to avoid the "break" and I think it makes it easy to understand. I am checking the conditions each time slows it but it is worth it to me at this time. </div>

<br></blockquote><div><br>It seems you are beginning to understand that programmer time is more valuable than machine time. Congratulations.<br> </div><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im"><div></div><div>def read_data_file(filename):</div><div>    reader = csv.reader(open(filename, "U"),delimiter='\t')</div><div><br></div></div><div>    data = []</div>

<div>    mask = []</div><div>    outliers = []</div><div>    modified = []</div><div>    </div><div>    data_append = data.append</div><div>    mask_append = mask.append</div><div>    outliers_append = outliers.append</div>


<div>    modified_append = modified.append</div><div>   <br></div></blockquote><div><br>I know some people do this to speed things up. Really, I don't think it's necessary or wise to do so.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div></div><div>    maskcount = 0</div><div>    outliercount = 0</div><div>    modifiedcount = 0</div><div class="im"><div>    </div><div>    for row in reader:</div><div>        if '[MASKS]' in row:</div>

</div><div>            maskcount += 1</div><div>        if '[OUTLIERS]' in row:</div><div>            outliercount += 1</div><div>        if '[MODIFIED]' in row:</div><div>            modifiedcount += 1</div>

<div>

        if not any((maskcount, outliercount, modifiedcount, not row)):</div><div>            data_append(row)</div><div>        elif not any((outliercount, modifiedcount, not row)):</div><div>            mask_append(row)</div>


<div>        elif not any((modifiedcount, not row)):</div><div>            outliers_append(row)</div><div>        else:</div><div>            if row: modified_append(row)</div><div>            </div></blockquote><div><br>

Just playing with the logic here:<br><br>1. Notice that if "not row" is True, nothing happens? Pull it out explicitly.<br><br>2. Notice how it switches from mode to mode? Program it more explicitly.<br><br>Here's my suggestion:<br>

<br>def parse_masks(reader):<br>    for row in reader:<br>        if not row: continue<br>        elif '[OUTLIERS]' in row: parse_outliers(reader)<br>


        elif '[MODIFIED]' in row: parse_modified(reader)<br>

       masks.append(row)<br><br>def parse_outliers(reader):<br>

    for row in reader:<br>

        if not row: continue<br>        elif '[MODIFIED]' in row: parse_modified(reader)<br>


       outliers.append(row)<br>

<br>def parse_modified(reader):<br>


    for row in reader:<br>


        if not row: continue<br>       modified.append(row)<br>

<br>for row in reader:<br>

    if not row: continue<br>

    elif '[MASKS]' in row: parse_masks(reader)<br>

    elif '[OUTLIERS]' in row: parse_outliers(reader)<br>


    elif '[MODIFIED]' in row: parse_modified(reader)<br>

    else: data.append(row)<br>

<br>Since there is global state involved, you may want to save yourself some trouble in the future and put the above in a class where separate parsers can be kept separate.<br><br>It looks like your program is turning into a regular old parser. Any format that is a little more than trivial to parse will need a real parser like the above.<br>

<br></div></div>-- <br>Jonathan Gardner<br><a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a><br>