what happens when the file begin read is too big for all lines to be read with "readlines()"
Xiao Jianfeng
fdu.xiaojf at gmail.com
Sun Nov 20 05:49:42 EST 2005
bonono at gmail.com wrote:
>Xiao Jianfeng wrote:
>
>
>> First, I must say thanks to all of you. And I'm really sorry that I
>>didn't
>> describe my problem clearly.
>>
>> There are many tokens in the file, every time I find a token, I have
>>to get
>> the data on the next line and do some operation with it. It should be easy
>> for me to find just one token using the above method, but there are
>>more than
>> one.
>>
>> My method was:
>>
>> f_in = open('input_file', 'r')
>> data_all = f_in.readlines()
>> f_in.close()
>>
>> for i in range(len(data_all)):
>> line = data[i]
>> if token in line:
>> # do something with data[i + 1]
>>
>> Since my method needs to read all the file into memeory, I think it
>>may be not
>> efficient when processing very big file.
>>
>> I really appreciate all suggestions! Thanks again.
>>
>>
>>
>something like this :
>
>for x in fh:
> if not has_token(x): continue
> else: process(fh.next())
>
>you can also create an iterator by iter(fh), but I don't think that is
>necessary
>
>using the "side effect" to your advantage. I was bite before for the
>iterator's side effect but for your particular apps, it becomes an
>advantage.
>
>
Thanks all of you!
I have compared the two methods,
(1). "for x in fh:"
(2). read all the file into memory firstly.
I have tested the two methods on two files, one is 80M and the second
one is 815M.
The first method gained a speedup of about 40% for the first file, and
a speedup
of about 25% for the second file.
Sorry for my bad English, and I hope I haven't made people confused.
Regards,
xiaojf
More information about the Python-list
mailing list