[Tutor] how to parse a multiple character words from plaintext
Kent Johnson
kent37 at tds.net
Sun Feb 24 15:21:10 CET 2008
---- John Gunderman <meanburrito920 at yahoo.com> wrote:
> I am parsing the output of the mork.pl, which is a DORK (the mozilla format) parser. I don't know Perl, so I decided to write a Python script to do what I wanted, which basically is to create a dictionary listing each site and its corresponding values instead of outputting into plaintext. Unfortunately, the output of mork.pl is 5000+ lines so reading the whole document wouldn't be that efficient.
OK, I looked briefly at mork.pl. You should be able to process it line-by-line with something like this:
for line in history_file:
if not line.strip():
continue # skip blank lines; may not be needed
time, count, url = line.split()
# do something with time, count, url
Kent
Currently it uses:
> for line in history_file.readlines():
> but I dont know if this has to read all lines before it goes through it. if it does, then would it be more efficient to use
> while line != '/t':
> line = history_file.readline()
> I was thinking of just appending each character to the string until it sees '/t', and then using int() on the string, but is there an easier way?
>
> John
>
> ----- Original Message ----
> From: Kent Johnson <kent37 at tds.net>
> To: John Gunderman <meanburrito920 at yahoo.com>
> Cc: tutor at python.org
> Sent: Saturday, February 23, 2008 3:43:44 AM
> Subject: Re: [Tutor] how to parse a multiple character words from plaintext
>
> John Gunderman wrote:
> > I am looking to parse a plaintext from a document. However, I am
> > confused about the actual methodology of it. This is because some of the
> > words will be multiple digits or characters. However, I don't know the
> > length of the words before the parse. Is there a way to somehow have
> > open() grab something until it sees a /t or ' '? I was thinking I could
> > have it count ahead the number of spaces till the stopping point and
> > then parse till that point using read(), but that seems sort of
> > inefficient. Is there a better way to pull this off? Thanks in advance.
>
> How big is the file? Can you just read the whole document and parse the
> resulting string? Or read by lines?
>
> Depending on how complex your parsing is, you might want to use
> pyparsing or one of the other Python parser libraries.
> http://pyparsing.wikispaces.com/
> http://nedbatchelder.com/text/python-parsers.html
>
> Kent
>
>
>
>
>
>
> ____________________________________________________________________________________
> Looking for last minute shopping deals?
> Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
More information about the Tutor
mailing list