parse HTML by class rather than tag
lorean2007 at yahoo.fr
lorean2007 at yahoo.fr
Fri Feb 23 02:54:20 EST 2007
Hello,
i'm would be interested in parsing a HTML files by its corresponding
opening and closing tags but by taking into account the class
attributes and its values,
<html>
<body>
...
<div class="one">
...
<div class="two">
</div>
...
</div>
...
<div class="one">...</div>
<a href="..." class="three">
</body>
</html>
in this example, i will need all content inside div with class="two",
or only class="one",
so i wondering if i should go with regular expression, but i do not
think so as i must jumpt after inner closing div, or with a simple
parser, i've searched and found
http://www.diveintopython.org/html_processing/basehtmlprocessor.html
but i would like the parser not to change anything at all (no
lowercase).
can you help ?
best.
More information about the Python-list
mailing list