speed question, reading csv using takewhile() and dropwhile()

Vincent Davis vincent at vincentdavis.net
Sun Feb 21 01:21:11 CET 2010


Thanks for the help, this is considerably faster and easier to read (see
below). I changed it to avoid the "break" and I think it makes it easy to
understand. I am checking the conditions each time slows it but it is worth
it to me at this time.
Thanks again
Vincent

def read_data_file(filename):
    reader = csv.reader(open(filename, "U"),delimiter='\t')

    data = []
    mask = []
    outliers = []
    modified = []

    data_append = data.append
    mask_append = mask.append
    outliers_append = outliers.append
    modified_append = modified.append

    maskcount = 0
    outliercount = 0
    modifiedcount = 0

    for row in reader:
        if '[MASKS]' in row:
            maskcount += 1
        if '[OUTLIERS]' in row:
            outliercount += 1
        if '[MODIFIED]' in row:
            modifiedcount += 1
        if not any((maskcount, outliercount, modifiedcount, not row)):
            data_append(row)
        elif not any((outliercount, modifiedcount, not row)):
            mask_append(row)
        elif not any((modifiedcount, not row)):
            outliers_append(row)
        else:
            if row: modified_append(row)

    data = data[1:]
    mask = mask[3:]
    outliers = outliers[3:]
    modified = modified[3:]
    return [data, mask, outliers, modified]

*Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Fri, Feb 19, 2010 at 4:36 PM, Jonathan Gardner <
jgardner at jonathangardner.net> wrote:

>
> On Fri, Feb 19, 2010 at 1:58 PM, Vincent Davis <vincent at vincentdavis.net>wrote:
>
>> In reference to the several comments about "[x for x in read] is basically
>> a copy of the entire list. This isn't necessary." or list(read). I had
>> thought I had a problem with having iterators in the takewhile() statement.
>> I thought I testes and it didn't work. It seems I was wrong. It clearly
>> works. I'll make this change and see if it is any better.
>>
>> I actually don't plan to read them all in at once, only as needed, but I
>> do need the whole file in an array to perform some mathematics on them and
>> compare different files. So my interest was in making it faster to open them
>> as needed. I guess part of it is that they are about 5mb so I guess it might
>> be disk speed in part.nks
>>
>>
>
> Record your numbers in an array and then work your magic on them later.
> Don't store the entire file in memory, though.
>
> --
> Jonathan Gardner
> jgardner at jonathangardner.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100220/5025733d/attachment.html>


More information about the Python-list mailing list