Remove spaces and line wraps from html?

RiGGa rigga at hasnomail.com
Sat Jun 19 08:06:23 CEST 2004


Paramjit Oberoi wrote:

>>>
http://groups.google.com/groups?q=HTMLPrinter&hl=en&lr=&ie=UTF-8&c2coff=1&selm=pan.2004.03.27.22.05.55.38448240hotmail.com&rnum=1
>>> 
>>> (or search c.l.p for "HTMLPrinter")
>>
>> Thanks, I forgot to mention I am new to Python so I dont yet know how to
>> use that example :(
> 
> Python has a HTMLParser module in the standard library:
> 
> http://www.python.org/doc/lib/module-HTMLParser.html
> http://www.python.org/doc/lib/htmlparser-example.html
> 
> It looks complicated if you are new to all this, but it's fairly simple
> really.  Using it is much better than dealing with HTML syntax yourself.
> 
> A small example:
> 
> --------------------------------------------------
> from HTMLParser import HTMLParser
> 
> class MyHTMLParser(HTMLParser):
>     def handle_starttag(self, tag, attrs):
>         print "Encountered the beginning of a %s tag" % tag
>     def handle_endtag(self, tag):
>         print "Encountered the end of a %s tag" % tag
> 
> my_parser=MyHTMLParser()
> 
> html_data = """
> <html>
>   <head>
>     <title>hi</title>
>   </head>
>   <body> hi </body>
> </html>
> """
> 
> my_parser.feed(html_data)
> --------------------------------------------------
> 
> will produce the result:
> Encountered the beginning of a html tag
> Encountered the beginning of a head tag
> Encountered the beginning of a title tag
> Encountered the end of a title tag
> Encountered the end of a head tag
> Encountered the beginning of a body tag
> Encountered the end of a body tag
> Encountered the end of a html tag
> 
> You'll be able to figure out the rest using the
> documentation and some experimentation.
> 
> HTH,
> -param
Thank you!! that was just the kind of help I was 
looking for.

Best regards

Rigga



More information about the Python-list mailing list