Fwd: validation error message consistency across Python versions

Hi: I'm finding different behaviour when trying validate XML documents against lxml 4.3.0 in Python 2.7 and 3.x environments. Example validation error using 4.3.0 and Python 2.7.15, libxml2 2.9.3+dfsg1-1ubuntu0.6 (xenial) Error: Element '{http://www.opengis.net/foo}Filter': This element is not expected. Expected is one of ( {http://www.opengis.net/ogc}Filter, {http://www.opengis.net/cat/csw/2.0.2}CqlText ). (line 0) Example validation error using 4.3.0 and Python 3.4.8, libxml2 2.9.3+dfsg1-1ubuntu0.6 (xenial) Error: Element '{http://www.opengis.net/foo}Filter': This element is not expected. Expected is one of ( {http://www.opengis.net/ogc}Filter, {http://www.opengis.net/cat/csw/2.0.2}CqlText ). (<string>, line 0) As a result this causes our (strict) functional testing to fail. Any idea what can be done to keep error messages consistent? Thanks ..Tom

Tom Kralidis schrieb am 22.01.19 um 14:32:
Just guessing without knowing your code, but it might be that you are passing an opened file object into parse(). If so, just pass the file path as a string instead. It's more efficient and probably also avoids this problem. The background is that file objects are parsed through the Python API in Python 3, but can use a shortcut at the C level in some cases in Python 2. That can lead to different knowledge about the source at the point where errors are reported. Parsing from file objects hinders thread parallelism. Stefan

On Sun, Jan 27, 2019 at 12:55 PM Stefan Behnel <stefan_ml@behnel.de> wrote:
Looking deeper, it's rooted in etree.fromstring: https://travis-ci.org/geopython/pycsw/jobs/483194973 https://github.com/geopython/pycsw/blob/master/pycsw/ogc/csw/csw2.py#L1628 where the string is read from a wsgi.input file object: https://github.com/geopython/pycsw/blob/master/pycsw/server.py#L237 So though etree.fromstring is being used, does the fact the wsgi.input is involved (as a file handle) constitute a potential issue?

Tom Kralidis schrieb am 28.01.19 um 00:53:
Hmm, no, fromstring() does not know (or care about) the source of the string, unless you pass an explicit "base_url". Also, the escaping ("<") seems weird. Honestly, I have no idea where this difference could come from. I couldn't find any code in lxml or libxml2 that could lead to such an error message. BTW, XMLSchema objects can be reused and are thread-safe, so there is no need to parse them again on each request, as done in https://github.com/geopython/pycsw/blob/master/pycsw/ogc/csw/csw2.py#L1622 Stefan

Tom Kralidis schrieb am 22.01.19 um 14:32:
Just guessing without knowing your code, but it might be that you are passing an opened file object into parse(). If so, just pass the file path as a string instead. It's more efficient and probably also avoids this problem. The background is that file objects are parsed through the Python API in Python 3, but can use a shortcut at the C level in some cases in Python 2. That can lead to different knowledge about the source at the point where errors are reported. Parsing from file objects hinders thread parallelism. Stefan

On Sun, Jan 27, 2019 at 12:55 PM Stefan Behnel <stefan_ml@behnel.de> wrote:
Looking deeper, it's rooted in etree.fromstring: https://travis-ci.org/geopython/pycsw/jobs/483194973 https://github.com/geopython/pycsw/blob/master/pycsw/ogc/csw/csw2.py#L1628 where the string is read from a wsgi.input file object: https://github.com/geopython/pycsw/blob/master/pycsw/server.py#L237 So though etree.fromstring is being used, does the fact the wsgi.input is involved (as a file handle) constitute a potential issue?

Tom Kralidis schrieb am 28.01.19 um 00:53:
Hmm, no, fromstring() does not know (or care about) the source of the string, unless you pass an explicit "base_url". Also, the escaping ("<") seems weird. Honestly, I have no idea where this difference could come from. I couldn't find any code in lxml or libxml2 that could lead to such an error message. BTW, XMLSchema objects can be reused and are thread-safe, so there is no need to parse them again on each request, as done in https://github.com/geopython/pycsw/blob/master/pycsw/ogc/csw/csw2.py#L1622 Stefan
participants (3)
-
Stefan Behnel
-
Tom Kralidis
-
Tom Kralidis