web page text extractor
Paul McGuire
ptmcg at austin.rr.com
Fri Jul 13 05:44:51 EDT 2007
On Jul 12, 4:42 am, kublai <restyc... at gmail.com> wrote:
> Hello,
>
> For a project, I need to develop a corpus of online news stories. I'm
> looking for an application that, given the url of a web page, "copies"
> the rendered text of the web page (not the source HTNL text), opens a
> text editor (Notepad), and displays the copied text for the user to
> examine and save into a text file. Graphics and sidebars to be
> ignored. The examples I have come across are much too complex for me
> to customize for this simple job. Can anyone lead me to the right
> direction?
>
> Thanks,
> gk
One of the examples provided with pyparsing is an HTML stripper - view
it online at http://pyparsing.wikispaces.com/space/showimage/htmlStripper.py.
-- Paul
More information about the Python-list
mailing list