HTML Parser

C.Laurence Gonsalves clgonsal at
Sun Dec 31 04:42:44 CET 2000

On Fri, 29 Dec 2000 10:26:31 -0500, Voitenko, Denis <dvoitenko at>
>I am trying to write an HTML parser. I am starting off with a simple
>one like so:
>input_file =
>jsp_content = newline.split(input_file)

Two things, neither of which answer your question (other have already
done that...):

First, you don't need to use re to split a file into lines. You could've
just said:

jsp_content = file.readlines()

(note that this, like your existing code, reads the entire file into
memory, which might not be a good idea if your file is huge)

Second, (this isn't Python related) you probably don't want to split
your file into lines in any case. HTML is *not* a line based language.
The following is a perfectly valid HTML tag:


Your code wouldn't work with such tags, since it works line-by-line.

  C. Laurence Gonsalves                "Any sufficiently advanced
  clgonsal at                     technology is indistinguishable          from magic." -- Arthur C. Clarke

More information about the Python-list mailing list