Undocumented or unwanted change in keyword parameter?

Hello, I don't want to report a bug (yet), as I don't know as what this should be reported - but it's definitely a problem. My info on the system where I reproduced it: Python : sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0) lxml.etree : (3, 5, 0, 0) libxml used : (2, 9, 1) libxml compiled : (2, 9, 1) libxslt used : (1, 1, 28) libxslt compiled : (1, 1, 28) In lxml==3.2.1 under Python 2.7 I instantiated it succesfully like this XmlParser([...], XMLSchema_schema=None, [...]) (note the underscore) In lxml==3.5.0 under Python 3.4 this started failing with the error: Traceback (most recent call last): File "/path/to/my/script.py", line 7, in <module> XMLSchema_schema=xmlSchema) File "src/lxml/parser.pxi", line 1437, in lxml.etree.XMLParser.__init__ (src/lxml/lxml.etree.c:120522) TypeError: __init__() got an unexpected keyword argument 'XMLSchema_schema' from lxml import etree The code looks like this: from lxml import etree xmlSchema = etree.XMLSchema(file='/path/to/some/schema.xsd') parser = etree.XMLParser( remove_blank_text=True, attribute_defaults=True, XMLSchema_schema=xmlSchema) It can be fixed by just using "schema" instead of "XMLSchema_schema" as keyword argument: XmlParser([...], schema=xmlSchema, [...]) --- I had a look at the sources of both versions. The affected source line In file src/lxml/lxml.etree.pyx:1437 in both versions looks the same def __init__([...], XMLSchema schema=None, [...]): so I guess somewhere between 3.21. and 3.5.0 the type hint does not get baked into the keyword argument anymore? My question would be: Is this a bug in the documentation that fails to identify the new keyword parameter as "schema" or is this an unwanted change of the parameter that should actually remain as "XMLSchema_schema" and turned accidentally into "schema"? cheers Oliver

Hi,
In lxml==3.2.1 under Python 2.7 I instantiated it succesfully like this XmlParser([...], XMLSchema_schema=None, [...]) (note the underscore)
Unfortunately I can't try this with 3.2.1 (the .tgz package seems broken on lxml.de) but the only reason I can imagine why this should ever have worked is that arbitrary keyword arguments were possible then. But the source code doesn't support this theory, so maybe a cython quirk back then? Are you sure your report is accurate?
Which is sane and expected imho so my take on it is this: Not a bug, works as intended. If it ever worked with keyword argument 'XMLSchema_schema' then this was a bug and it is now fixed. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hello Holger, thanks to your questions I got to the root of the problem. You are correct tin assuming that XMLSchema_schema very likely never has worked. So this is purely a documentation problem. If you look at the API documentation at http://lxml.de/api/lxml.etree.XMLParser-class.html XMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, XMLSchema schema=None, remove_blank_text=False, resolve_entities=True, remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True) schema is the only parameter that is containing what I guess is a Cython type hint, which IMO does not belong in the API documentation. This is not only confusing for humans who don't know about Cython type hints. It is also causing the auto generation of stubs in the PyCharm IDE to generate the wrong keyword argument "XMLSchema_schema" which then in combination with the API doc can lead one to believe that the keyword argument is in fact "XMLSchema_schema". But this is basically a PyCharm bug ... Cheers Oliver On Thu, 17 Mar 2016 at 09:32 Holger Joukl <Holger.Joukl@lbbw.de> wrote:

I respectfully disagree. I think the API docs should provide whatever information it can give. The notation resembles C function signatures here so I think it's reasonably intuitive. Maybe API docs with Python 3 type annotation syntax instead may be nicer for a Python API, regardless if it's implemented in Cython. Of course, one could also argue that it's actually a Cython API, which happens to be importable & usable in Python ;-) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Holger Joukl schrieb am 17.03.2016 um 17:23:
If it helps ... https://github.com/lxml/lxml/commit/477e721dc083ed512182d9a2339bb64ee9f37937 Stefan

Hi,
In lxml==3.2.1 under Python 2.7 I instantiated it succesfully like this XmlParser([...], XMLSchema_schema=None, [...]) (note the underscore)
Unfortunately I can't try this with 3.2.1 (the .tgz package seems broken on lxml.de) but the only reason I can imagine why this should ever have worked is that arbitrary keyword arguments were possible then. But the source code doesn't support this theory, so maybe a cython quirk back then? Are you sure your report is accurate?
Which is sane and expected imho so my take on it is this: Not a bug, works as intended. If it ever worked with keyword argument 'XMLSchema_schema' then this was a bug and it is now fixed. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hello Holger, thanks to your questions I got to the root of the problem. You are correct tin assuming that XMLSchema_schema very likely never has worked. So this is purely a documentation problem. If you look at the API documentation at http://lxml.de/api/lxml.etree.XMLParser-class.html XMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, XMLSchema schema=None, remove_blank_text=False, resolve_entities=True, remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True) schema is the only parameter that is containing what I guess is a Cython type hint, which IMO does not belong in the API documentation. This is not only confusing for humans who don't know about Cython type hints. It is also causing the auto generation of stubs in the PyCharm IDE to generate the wrong keyword argument "XMLSchema_schema" which then in combination with the API doc can lead one to believe that the keyword argument is in fact "XMLSchema_schema". But this is basically a PyCharm bug ... Cheers Oliver On Thu, 17 Mar 2016 at 09:32 Holger Joukl <Holger.Joukl@lbbw.de> wrote:

I respectfully disagree. I think the API docs should provide whatever information it can give. The notation resembles C function signatures here so I think it's reasonably intuitive. Maybe API docs with Python 3 type annotation syntax instead may be nicer for a Python API, regardless if it's implemented in Cython. Of course, one could also argue that it's actually a Cython API, which happens to be importable & usable in Python ;-) Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Holger Joukl schrieb am 17.03.2016 um 17:23:
If it helps ... https://github.com/lxml/lxml/commit/477e721dc083ed512182d9a2339bb64ee9f37937 Stefan
participants (3)
-
Holger Joukl
-
Oliver Bestwalter
-
Stefan Behnel