
Hello and many thanks for lxml module! I am discovering its capabilities but I can hardly end up my first parser, because of 2 issues. First, I have a valid file.html to parse, here is an example: <html> <body> <tbody> <tr> <td> <table id="tbHeaderImages"> <span class="Title"> BLO BLO <span style="color:red; font-weight:bold">BLU BLU</span> BLA BLA </span> <br> <br> Some texttext <br> ... </table> </tbody> </body> </html> I have many documents each inside table: for document in TREE.xpath('/html/body/table'): Then, I want to extract each title: title = document.xpath("./tr/td/span[@class = 'Title']/text()") Problem 1: I get "BLO BLO" and "BLA BLA" whereas I would like "BLO BLO" and "BLU BLU" and "BLA BLA". I have heard that I should get the value of the text from the first node. Get the second node (not the text value) and use the .text attribute to replace the text. But I do not really understand how to manage it. Please may you give me an example. Problem 2: I want also to extract "Some texttext" just after the span class. Then, I have to trigger an event in an SAX like method. If you agree with with how I could do such trick if I am already working in the TREE.xpath('/html/body/table') ? Many thanks for your help, -- Alexandre Delanoë