[Web-SIG] HTML parsing - get text position and font size
manlio_perillo at libero.it
Mon Jan 12 15:11:04 CET 2009
Girish Redekar ha scritto:
> I'm trying to build a search engine in python am stuck at the place
> where I parse HTML to get useful text. One should ideally be able to
> parse the text (out of HTML tags) along with its position (for phrase
> searches) and font-size (to weigh words appropriately).
Words weight should be done using semantics, not style.
However, if you really need it, for CSS parsing, there is cssutils package.
I'm writing a CSS parser, too:
using PLY, so it should easy to read/modify.
It is still in very early stage.
Regards Manlio Perillo
More information about the Web-SIG