Partly erratic wrong behaviour, Python 3, lxml

Jussi Piitulainen jpiitula at ling.helsinki.fi
Fri Mar 5 15:43:47 CET 2010


Stefan Behnel writes:
> Jussi Piitulainen, 04.03.2010 22:40:
> > Stefan Behnel writes:
> >> Jussi Piitulainen, 04.03.2010 11:46:
> >>> I am observing weird semi-erratic behaviour that involves Python 3
> >>> and lxml, is extremely sensitive to changes in the input data, and
> >>> only occurs when I name a partial result. I would like some help
> >>> with this, please. (Python 3.1.1; GNU/Linux; how do I find lxml
> >>> version?)
> >>
> >> Here's how to find the version:
> >>
> >> http://codespeak.net/lxml/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do
> >
> > Ok, thank you. Here's the results:
> >
> > >>> print(et.LXML_VERSION, et.LIBXML_VERSION,
> > ...       et.LIBXML_COMPILED_VERSION, et.LIBXSLT_VERSION,
> > ...       et.LIBXSLT_COMPILED_VERSION)
> > (2, 2, 4, 0) (2, 6, 26) (2, 6, 26) (1, 1, 17) (1, 1, 17)
> 
> I can't reproduce this with the latest lxml trunk (and Py3.2 trunk)
> and libxml2 2.7.6, even after running your test script for quite a
> while. I'd try to upgrade the libxml2 version.

Thank you much. I suppose that is good news. It's a big server with
many users - I will ask the administrators to consider an upgrade when
I get around to it.

Turns out that lxml documentation warns not to use libxml2 version
2.6.27 if I want to use xpath, and that is just a bit newer than we
have. On that cue, I seem to have found a workaround: I replaced the
xpath expression with findall(titlef) where

    titlef = ( '//{http://www.openarchives.org/OAI/2.0/}record'
               '//{http://purl.org/dc/elements/1.1}title' )

In the previously broken naming() function I now have:

        result = etree.parse(BytesIO(body))
        n = len(result.findall(titlef))

And in the previously working nesting() function:

        n = len(etree.parse(BytesIO(body)).findall(titlef))

With these changes, the test script gives consistently the result that
I expect, and the more complicated real test script where I first met
the problem also appears to work without a hitch. So, this works.

The other, broken behaviour is totally scary, though.



More information about the Python-list mailing list