Mailman 3 Question about problem with NCR entities in lxml under PyPy - lxml - The Python XML Toolkit

Sept. 30, 2015

      I have a simple test that fails  using lxml 3.4.4 running under PyPy 2.6.1.
  It succeeds as expected under CPython from Python 2.7.10.

Both of the sample xml blocks are similar except for the inclusion of a
numeric character reference (   or en space) in the failing sample.
I include samples and the test at the bottom of this email.   The
sample2_etree will parse  successfully, but sample_etree will not assign a
bar value.

Is there any testing of these characters in the lxml test suite or
suggested work arounds?   The problem I encounter is that any use of
element.xpath(".//text()")  on xml that contains an NCR will generate a
stackOverflow :

Thanks.

- Jeff Doran

-------
TEST
-------
sample="""<name xmlns="http://www.epo.org/exchange" xmlns:ops="
http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">NEXT
COMPUTER INC [US]</name>
"""
sample2="""<name xmlns="http://www.epo.org/exchange" xmlns:ops="
http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">NEXT
COMPUTER INC [US]</name>
"""

def simple_xpath_test():
    import lxml

    sample2_etree = lxml.etree.fromstring(sample2)
    print "____full sample2 %r", sample2

    bar = sample2_etree.xpath('.//text()')
    print "____bar = %r", bar
    assert (bar is not None)

    sample_etree = lxml.etree.fromstring(sample)
    print "____full sample %r", sample

    bar = sample_etree.xpath('.//text()')
    print "____bar = %r", bar
    assert (bar is not None)

======================================================================
ERROR: bias.tests.epo.test_epo_patent.simple_xpath_test
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/jeff/lexmachina/deus_lex/.tox/pypy/site-packages/nose/case.py", line
197, in runTest
    self.test(*self.arg)
  File
"/home/jeff/lexmachina/deus_lex/bias/bias/tests/epo/test_epo_patent.py",
line 45, in simple_xpath_test
    bar = sample_etree.xpath('.//text()')
  File "lxml.etree.pyx", line 1507, in lxml.etree._Element.xpath
(src/lxml/lxml.etree.c:52198)
  File "xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__
(src/lxml/lxml.etree.c:152124)
SystemError: <StackOverflow object at 0x7fbc2f3167b0>

-------------------- >> begin captured stdout << ---------------------

____full sample2 %r <name xmlns="http://www.epo.org/exchange" xmlns:ops="
http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">NEXT
COMPUTER INC [US]</name>
____bar = %r ['NEXT COMPUTER INC [US]']

____full sample %r <name xmlns="http://www.epo.org/exchange" xmlns:ops="
http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">NEXT
COMPUTER INC [US]</name>

--------------------- >> end captured stdout << ----------------------

Question about problem with NCR entities in lxml under PyPy

Jeff Doran

Stefan Behnel

Stefan Behnel

tags

participants (2)