speed question, reading csv using takewhile() and dropwhile()
Vincent Davis
vincent at vincentdavis.net
Sat Feb 20 19:21:11 EST 2010
Thanks for the help, this is considerably faster and easier to read (see
below). I changed it to avoid the "break" and I think it makes it easy to
understand. I am checking the conditions each time slows it but it is worth
it to me at this time.
Thanks again
Vincent
def read_data_file(filename):
reader = csv.reader(open(filename, "U"),delimiter='\t')
data = []
mask = []
outliers = []
modified = []
data_append = data.append
mask_append = mask.append
outliers_append = outliers.append
modified_append = modified.append
maskcount = 0
outliercount = 0
modifiedcount = 0
for row in reader:
if '[MASKS]' in row:
maskcount += 1
if '[OUTLIERS]' in row:
outliercount += 1
if '[MODIFIED]' in row:
modifiedcount += 1
if not any((maskcount, outliercount, modifiedcount, not row)):
data_append(row)
elif not any((outliercount, modifiedcount, not row)):
mask_append(row)
elif not any((modifiedcount, not row)):
outliers_append(row)
else:
if row: modified_append(row)
data = data[1:]
mask = mask[3:]
outliers = outliers[3:]
modified = modified[3:]
return [data, mask, outliers, modified]
*Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
On Fri, Feb 19, 2010 at 4:36 PM, Jonathan Gardner <
jgardner at jonathangardner.net> wrote:
>
> On Fri, Feb 19, 2010 at 1:58 PM, Vincent Davis <vincent at vincentdavis.net>wrote:
>
>> In reference to the several comments about "[x for x in read] is basically
>> a copy of the entire list. This isn't necessary." or list(read). I had
>> thought I had a problem with having iterators in the takewhile() statement.
>> I thought I testes and it didn't work. It seems I was wrong. It clearly
>> works. I'll make this change and see if it is any better.
>>
>> I actually don't plan to read them all in at once, only as needed, but I
>> do need the whole file in an array to perform some mathematics on them and
>> compare different files. So my interest was in making it faster to open them
>> as needed. I guess part of it is that they are about 5mb so I guess it might
>> be disk speed in part.nks
>>
>>
>
> Record your numbers in an array and then work your magic on them later.
> Don't store the entire file in memory, though.
>
> --
> Jonathan Gardner
> jgardner at jonathangardner.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100220/5025733d/attachment-0001.html>
More information about the Python-list
mailing list