Mailman 3 incorrect handling of dt/dl in parsing html ? - lxml - The Python XML Toolkit

Oct. 20, 2017

      g'day

i'm trying to parse the "usual" netscape-format bookmarks.html produced
by Firefox (and Opera recently). The way lxml parses the tree is
different from how it should be and how both Firefox and Opera parse
it.

(sub)Folders there are represented as [dl] inside a [dt] , but lxml puts
the [dl] next to the [dt], on same level as the [dt].

Is this a mistake or something intentional?

ciao
svil

(angle brackets replaced with [] below)

[DL][p]
    [DT][A HREF= "hrf1/" ] name1[/A]
    [DD]
    [DT][A HREF= "hrf2"] name2[/A]
    [DD]
    [DT][H3] folder1[/H3]
    [DL][p]
        [DT][A HREF= "hrf11"] name3[/A]
        [DD]dd1
        [DT][H3] folder2[/H3]
        [DL][p]
            [DT][A HREF= "hrf22"] name4[/A]
            [DD]dd2
        [/DL][p]
        [DT][A HREF= "hrf33"] name5[/A]
        [DD]dd3
    [/DL][p]
[/DL]

incorrect handling of dt/dl in parsing html ?

svilen

Stefan Behnel

svilen

tags

participants (2)