lxml.objectify interprets <x>1000_000</x> as int in Python>3.6 - should it?
Hi, python2.7 -c 'from lxml import objectify; root = objectify.fromstring("<root><x>1000_000</x></root>"); print(root.x, type(root.x), type(root.x.pyval)); print(root.x.text, type(root.x.text)); print(objectify.dump(root))' ('1000_000', <type 'lxml.objectify.StringElement'>, <type 'str'>) ('1000_000', <type 'str'>) root = None [ObjectifiedElement] x = '1000_000' [StringElement] python3.6 -c 'from pytaf.objectify.xmsg import *; from lxml import etree, objectify; root = objectify.fromstring("<root><x>1000_000</x></root>"); print(root.x, type(root.x), type(root.x.pyval)); print(root.x.text, type(root.x.text)); print(objectify.dump(root))' 1000000 <class 'lxml.objectify.IntElement'> <class 'int'> 1000_000 <class 'str'> root = None [ObjectifiedElement] x = 1000000 [IntElement] According to https://www.w3.org/TR/xmlschema-2/#integer 1000_000 is not a valid integer literal. But it is for Python since 3.6. The magic lxml.objectify type lookup/annotation simple does int(s) and interprets success as "shall be interpreted as int". One could argue that - when parsing XML data - this is not the right/sane/intuitive choice. Or is it? :-) <x>1000_000</x> is not an integer in the XML world. Opinions if that should be changed, maybe switchable? Would it break things for you? Of course, Python 3.6 (and consequently this objectify behavior) is 5 years old now and nobody seemed bothered. Plus you can customize the default lookup mechanism anyway. Cheers, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
Hi Holger, sorry for the late reply. But since no-one else replied, it doesn't seem an important topic for the readers. Holger.Joukl@LBBW.de schrieb am 21.07.21 um 20:17:
python2.7 -c 'from lxml import objectify; root = objectify.fromstring("<root><x>1000_000</x></root>"); print(root.x, type(root.x), type(root.x.pyval)); print(root.x.text, type(root.x.text)); print(objectify.dump(root))' ('1000_000', <type 'lxml.objectify.StringElement'>, <type 'str'>) ('1000_000', <type 'str'>) root = None [ObjectifiedElement] x = '1000_000' [StringElement]
python3.6 -c 'from pytaf.objectify.xmsg import *; from lxml import etree, objectify; root = objectify.fromstring("<root><x>1000_000</x></root>"); print(root.x, type(root.x), type(root.x.pyval)); print(root.x.text, type(root.x.text)); print(objectify.dump(root))' 1000000 <class 'lxml.objectify.IntElement'> <class 'int'> 1000_000 <class 'str'> root = None [ObjectifiedElement] x = 1000000 [IntElement]
According to https://www.w3.org/TR/xmlschema-2/#integer 1000_000 is not a valid integer literal. But it is for Python since 3.6.
The magic lxml.objectify type lookup/annotation simple does int(s) and interprets success as "shall be interpreted as int". One could argue that - when parsing XML data - this is not the right/sane/intuitive choice. Or is it? :-) <x>1000_000</x> is not an integer in the XML world.
Then we shouldn't make it one. It's unlikely that data gets passed through XML in Python syntax. We have the same for "True" and "False", which come out as str, not bool. And this applies to FloatElement as well, which uses float() as parser and thus also supports "_" in Py3.6+. I'll see what I can come up with. Stefan
Stefan Behnel schrieb am 12.08.21 um 09:21:
Holger.Joukl schrieb am 21.07.21 um 20:17:
python2.7 -c 'from lxml import objectify; root = objectify.fromstring("<root><x>1000_000</x></root>"); print(root.x, type(root.x), type(root.x.pyval)); print(root.x.text, type(root.x.text)); print(objectify.dump(root))' ('1000_000', <type 'lxml.objectify.StringElement'>, <type 'str'>) ('1000_000', <type 'str'>) root = None [ObjectifiedElement] x = '1000_000' [StringElement]
python3.6 -c 'from pytaf.objectify.xmsg import *; from lxml import etree, objectify; root = objectify.fromstring("<root><x>1000_000</x></root>"); print(root.x, type(root.x), type(root.x.pyval)); print(root.x.text, type(root.x.text)); print(objectify.dump(root))' 1000000 <class 'lxml.objectify.IntElement'> <class 'int'> 1000_000 <class 'str'> root = None [ObjectifiedElement] x = 1000000 [IntElement]
According to https://www.w3.org/TR/xmlschema-2/#integer 1000_000 is not a valid integer literal. But it is for Python since 3.6.
The magic lxml.objectify type lookup/annotation simple does int(s) and interprets success as "shall be interpreted as int". One could argue that - when parsing XML data - this is not the right/sane/intuitive choice. Or is it? :-) <x>1000_000</x> is not an integer in the XML world.
Then we shouldn't make it one. It's unlikely that data gets passed through XML in Python syntax. We have the same for "True" and "False", which come out as str, not bool. And this applies to FloatElement as well, which uses float() as parser and thus also supports "_" in Py3.6+.
I'll see what I can come up with.
https://github.com/lxml/lxml/commit/83e6c031994d553b74991501c6cd85e3517fadd8 Stefan
participants (2)
-
Holger.Joukl@LBBW.de
-
Stefan Behnel