getting infos from a website

Steve Holden sholden at holdenweb.com
Mon Apr 1 17:27:56 EST 2002


"Zutroi Zatatakowski" <abou at cam.org> wrote in message
news:mailman.1017521988.813.python-list at python.org...
>
>
> Zutroi Zatatakowski wrote:
> >
> > By the way, I can c.write() without any problem into the file though...
> > That's really puzzling me, I can write but cannot read. I know it's a
> > newbie problem but eh, so I am. :)
>
> Ok, it was pretty stupid, I was just not putting 'print' in front of it.
> But another thing... Now that I can capture a website html and output it
> into a file, I have to remove all html tags (I guess replacing '<>' by '
> ') or, but I don't know if it's possible, instead of capturing the HTML
> source of the page, could I retrieve only the text, like basic ASCII
> copy/paste?
>
You've got to the stage where you need to consider parsing the HTML. Take a
look at htgmllib abd sgmllib to see if you can work out how to do that. It
isn't as complicated as it seems once you've done it once ...

regards
 Steve







More information about the Python-list mailing list