[Tutor] Finding a specific line in a body of text

Robert Sjoblom robert.sjoblom at gmail.com
Mon Mar 12 05:46:39 CET 2012


> You haven't shown us the critical part: how are you getting the lines in
> the first place?

Ah, yes --
with open(address, "r", encoding="cp1252") as instream:
    for line in instream:

> (Also, you shouldn't shadow built-ins like list as you do above, unless
> you know what you are doing. If you have to ask "what's shadowing?", you
> don't :)
Maybe I should have said list_name.append() instead; sorry for that.

>> This, however, turned out to be unacceptably slow; this file is 1.1M
>> lines, and it takes roughly a minute to go through. I have 450 of
>> these files; I don't have the luxury to let it run for 8 hours.
>
> Really? And how many hours have you spent trying to speed this up? Two?
> Three? Seven? And if it takes people two or three hours to answer your
> question, and you another two or three hours to read it, it would have
> been faster to just run the code as given :)
Yes, for one set of files. Since I don't know how many sets of ~450
files I'll have to run this over, I think that asking for help was a
rather acceptable loss of time. I work on other parts while waiting
anyway, or try and find out on my own as well.

> - if you need to stick with Python, try this:
>
> # untested
> results = []
> fp = open('filename')
> for line in fp:
>    if key in line:
>        # Found key, skip the next line and save the following.
>        _ = next(fp, '')
>        results.append(next(fp, ''))

Well that's certainly faster, but not fast enough.
Oh well, I'll continue looking for a solution -- because even with the
speedup it's unacceptable. I'm hoping against hope that I only have to
run it against the last file of each batch of files, but if it turns
out that I don't, I'm in for some exciting days of finding stuff out.
Thanks for all the help though, it's much appreciated!

How do you approach something like this, when someone tells you "we
need you to parse these files. We can't tell you how they're
structured so you'll have to figure that out yourself."? It's just so
much text that's it's hard to get a grasp on the structure, and
there's so much information contained in there as well; this is just
the first part of what I'm afraid will be many. I'll try not to bother
this list too much though.
-- 
best regards,
Robert S.


More information about the Tutor mailing list