html tags and webpages

Thanos Vassilakis tvassila at siac.com
Thu Jul 17 15:31:00 EDT 2003


The problem you will have is HTMLParser is not great at getting the CDATA
or PCDATA - thats the stuff between the tags.

Try pso http://sourceforge.net/projects/pso/




|---------+---------------------------------------------->
|         |           Rene Pijlman                       |
|         |           <reply.in.the.newsgroup at my.address.|
|         |           is.invalid>                        |
|         |           Sent by:                           |
|         |           python-list-admin at python.org       |
|         |                                              |
|         |                                              |
|         |           07/17/2003 02:43 PM                |
|         |                                              |
|---------+---------------------------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                              |
  |       To:       python-list at python.org                                                                                       |
  |       cc:       (bcc: Thanos Vassilakis/SIAC)                                                                                |
  |       Subject:  Re: html tags and webpages                                                                                   |
  >------------------------------------------------------------------------------------------------------------------------------|




jeff:
>Basically what i want to do is create a script in python that will
>look at a website and pull off a certain type of html tag and anything
>contained within them tags,

This is what you need:
http://www.python.org/doc/2.2.3/lib/module-urllib.html
http://www.python.org/doc/2.2.3/lib/module-HTMLParser.html

--
René Pijlman
--
http://mail.python.org/mailman/listinfo/python-list






-----------------------------------------
This message and its attachments may contain  privileged and confidential information.  If you are not the intended recipient(s), you are prohibited from printing, forwarding, saving or copying this email.  If you have received this e-mail in error, please immediately notify the sender and delete this e-mail and its attachments from your computer.






More information about the Python-list mailing list