[python-win32] Parse HTML String only not file
Tim Roberts
timr at probo.com
Thu Jun 17 21:06:49 CEST 2010
On 6/17/2010 11:09 AM, Mauricio Martinez Garcia wrote:
> Hi, how can parse an HTML String.
> I need parse next Line :
>
> '<FIELD><NAME>BSCS
> status</NAME><TYPE>string</TYPE><VALUE>none</VALUE></FIELD><FIELD><NAME>TopCre_life</NAME><TYPE>integer</TYPE><VALUE>0</VALUE></FIELD>'
That's not HTML. It's XML. You CAN parse this with the SGMLParser
(since XML is a variant of SGML), but you might consider whether you
would be better served using xmllib, or even xml.sax.
> Result of program its:
>
> bash-3.1$ ./pruebasDOM.py
> ['BSCS status']
> ['string']
> ['none']
> ['TopCre_life']
> ['integer']
> ['0']
>
>
> I can't pass the data to one dict() or []. I need all values, ['BSCS
> Status', 'string', 'none', 'TopCre_life', 'integer', '0']
>
> That i can do?
Of course. Just change your ParserHTML class to create a list in "def
__init__", then append the values that you get to the list instead of
printing them. So, for example:
class ParserHTML(SGMLParser):
def __init__(self):
SGMLParser.__init__(self)
self.results = []
...
def handle_data(self, data):
...
self.results.append(data)
...
if __name__ == '__main__':
...
p = ParserHTML()
p.feed(node)
print p.results
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the python-win32
mailing list