Mailman 3 LXML getchildren() method does not return a child div tag with style="display:none" - lxml - The Python XML Toolkit

Feb. 28, 2019

      I was using beautifulsoup to parse the page at:
https://irs.thsrc.com.tw/IMINT/
In the rendering, there is a confirmation box popping out in front of all other tags asking for a confirmation.
The tag is inside the sole form tag with xpath: /html/body/div[1]/form/div[2].
Both this tag and the previous tag at /html/body/div[1]/form/div[1] are with style attribute "display:none".
The weird thing is that when I used getchildren() to get the child tags of the form tag, the second DIV tag in the form tag is not in the returned list.
In fact, LXML will skip the second tag.
I tried beautifulsoup.
But the problem is the same.  E.children attribute does not contain the second div tag in the form tag.

I copied part of the html code and the whole method that recursively scan all tags in the following.
Some unicode data that cannot pass the stackoverflow filter is changed to ascii.

I will appreciate it very much if anyone can tell me how to get the tag which is the pop-out tag waiting for the confirmation.
Thanks

    <html>
            <head>
                   <title> taiwan hsrc </title>
        </head>
            <body topmargin="0" rightmargin="0" bottommargin="0" bgcolor="#FFFFFF" leftmargin="0">
            <!----- error message ends ----->
            <form action="/IMINT/;jsessionid=4A74C40B8D68474DF0B6F49E953DD825?wicket:interface=:0:BookingS1Form::IFormSubmitListener" id="BookingS1Form" method="post">
                <div style="display:none">
                    <input type="hidden" name="BookingS1Form:hf:0" id="BookingS1Form:hf:0" />
                </div>
                    <div style="display:none; padding:3px 10px 5px;text-align:center;" id="dialogCookieInfo" title="Taiwan high-speed rail" wicket:message="title=bookingdialog_3">
                            <div class="JCon">
                                <div class="TCon">
                                    <div class="overDiffText">
                                            <div style="text-align: left;">
                                                    <span>for better service
                                        <a target="_blank" class="c" style="color:#FF9900;" href="https://www.thsrc.com.tw/tw/Article/ArticleContent/d1fa3bcb-a016-47e2-88c6-7b7cbed00ed5?tabIndex=1">
                                            privacy
                                        </a>
                                       。
                                    </span>
                                            </div>
                                    </div>
                                            <div class="action">
                                        <table border="0" cellpadding="0" cellspacing="0" align="center">
                                            <tr>
                                                <td>
                                                        <input hidefocus="true" name="confirm" id="btn-confirm" type="button" class="button_main" value="我同意"/>
                                                </td>
                                            </tr>
                                        </table>
                                    </div>
                                </div>
                            </div>
                    </div>
                    <div id="content" class="content">
                        <!----- marquee starts ----->
                        <marquee id="marqueeShow" behavior="scroll" scrollamount="1" direction="left" width="755">
                    </marquee>
                        <!----- marquee ends ----->
                        <div class="tit">
                                  <span>一般訂票</span>
                    </div>
                </form>
            |</div>
        </body>
    </html>

My code with LXML for scanning the html is the following.

     def actionableLXML(cls, e):
            global count
            print ("rec[", count, "], xpath: ", xmlTree.getpath(e))
            countLabelActionableInside += 1
            flagActionableInside = False
            if e.tag in cls._clickable_tags \
            or e.tag == 'input' or e.tag == 'select':
                flagActionableInside = True
            else:
                flagActionableInside = False
            for c in e.getchildren():
                flagActionableInside |= cls.actionableLXML(c)
            if e.attrib and 'style' in e.attrib \
            and 'display:' in e.attrib['style'] \
            and 'none' in e.attrib['style']:
                if not flagActionableInside:
                    e.getparent().remove(e)
            return flagActionableInside

LXML getchildren() method does not return a child div tag with style="display:none"

王凡

Stefan Behnel

Stefan Behnel

tags

participants (2)