speed question, reading csv using takewhile() and dropwhile()
MRAB
python at mrabarnett.plus.com
Fri Feb 19 15:02:10 EST 2010
Vincent Davis wrote:
> I have some some (~50) text files that have about 250,000 rows each. I
> am reading them in using the following which gets me what I want. But it
> is not fast. Is there something I am missing that should help. This is
> mostly an question to help me learn more about python. It takes about 4
> min right now.
>
> def read_data_file(filename):
> reader = csv.reader(open(filename, "U"),delimiter='\t')
> read = list(reader)
> data_rows = takewhile(lambda trow: '[MASKS]' not in trow, [x for x
> in read])
'takewhile' accepts an iterable, so "[x for x in read]" can be
simplified to "read".
> data = [x for x in data_rows][1:]
>
data = data_rows[1:]
> mask_rows = takewhile(lambda trow: '[OUTLIERS]' not in trow,
> list(dropwhile(lambda drow: '[MASKS]' not in drow, read)))
> mask = [row for row in mask_rows if row][3:]
>
No need to convert the result of 'dropwhile' to list.
> outlier_rows = dropwhile(lambda drows: '[OUTLIERS]' not in drows, read)
> outlier = [row for row in outlier_rows if row][3:]
>
The problem, as I see it, is that you're scanning the rows more than
once.
Is this any better?
def read_data_file(filename):
reader = csv.reader(open(filename, "U"),delimiter='\t')
data = []
for row in reader:
if '[MASKS]' in row:
break
data.append(row)
data = data[1:]
mask = []
if '[MASKS]' in row:
mask.append(row)
for row in reader:
if '[OUTLIERS]' in row:
break
if row:
mask.append(row)
mask = mask[3:]
outlier = []
if '[OUTLIERS]' in row:
outlier.append(row)
outliter.extend(row for row in outlier if row)
outlier = outlier[3:]
More information about the Python-list
mailing list