HTML Parser

Voitenko, Denis dvoitenko at
Fri Dec 29 10:26:31 EST 2000


I am trying to write an HTML parser. I am starting off with a simple one
like so:

import re
import string


file=open('C:\Documents and Settings\dvoitenko\My
Documents\Python\index.jsp', 'r')
input_file =

jsp_content = newline.split(input_file)

# loop thru lines ...
for line in jsp_content[:]:
	tag_content = line[result.start()+1:result.end()-1]
	print '<'+string.upper(tag_content)+'>'

Which simply uppercases all html tags. Well, it actually uppercases
everything else which I do not like. What seems to be wrong here? Say if I
have a link <a href=hello.jsp>Hello</a> it will result into <A
HREF=HELLO.JSP>HELLO</A>... which is not right.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Python-list mailing list