[Tutor] how to parse a multiple character words from plaintext
Kent Johnson
kent37 at tds.net
Sun Feb 24 15:01:27 CET 2008
---- John Gunderman <meanburrito920 at yahoo.com> wrote:
> I am parsing the output of the mork.pl, which is a DORK (the mozilla format) parser. I don't know Perl, so I decided to write a Python script to do what I wanted, which basically is to create a dictionary listing each site and its corresponding values instead of outputting into plaintext. Unfortunately, the output of mork.pl is 5000+ lines so reading the whole document wouldn't be that efficient.
If you have enough memory for it to fit, reading the whole file at once is fine.
> Currently it uses:
> for line in history_file.readlines():
> but I dont know if this has to read all lines before it goes through it.
Yes, readlines() reads the entire file.
> if it does, then would it be more efficient to use
> while line != '/t':
> line = history_file.readline()
Probably not. But why so much emphasis on efficiency? Get the program working first. Only if it is too slow should you worry about efficiency. Processing a 5000-line file should not be a problem in Python.
> I was thinking of just appending each character to the string until it sees '/t', and then using int() on the string, but is there an easier way?
It would really help to see a sample of the data and the results you want from it. There are many ways to parse data in Python, from simple string operations to regular expressions to full-blown parsers. Without knowing what you want to do it is impossible to suggest an appropriate method.
Kent
More information about the Tutor
mailing list