Re: When is a number not a number
Hi, Does anyone think this needs to be posted to the bug tracker?
lxml seems to identify superscripts as an integer but then throws an exception.
Thanks
Alex
from lxml import objectify xml = """ <types> <mysuperscript>²²²²²²²²²²</mysuperscript> </types> """ doc = objectify.fromstring(xml) print(objectify.dump(doc))
Traceback (most recent call last): File “**********.py", line 11, in <module> print(objectify.dump(doc)) ^^^^^^^^^^^^^^^^^^^ File "src/lxml/objectify.pyx", line 1521, in lxml.objectify.dump File "src/lxml/objectify.pyx", line 1549, in lxml.objectify._dump File "src/lxml/objectify.pyx", line 1526, in lxml.objectify._dump File "src/lxml/objectify.pyx", line 646, in lxml.objectify.NumberElement.__repr__ File "src/lxml/objectify.pyx", line 946, in lxml.objectify._parseNumber ValueError: invalid literal for int() with base 10: '²²²²²²²²²²'
Looks like a bug to me. For reasons I don't yet understand, the int type check in objectify's type guesser (see https://lxml.de/objectify.html#how-data-types-are-matched) does not fail for this input:
objectify.getRegisteredTypes() [PyType(int, IntElement), PyType(float, FloatElement), PyType(bool, BoolElement), PyType(long, IntElement), PyType(str, StringElement), PyType(NoneType, NoneElement), PyType(none, NoneElement)] objectify.getRegisteredTypes()[0] PyType(int, IntElement) print(objectify.getRegisteredTypes()[0].type_check("222")) None print(objectify.getRegisteredTypes()[0].type_check("²²²²²²²²²²")) # Should raise! None print(objectify.getRegisteredTypes()[0].type_check("abcd")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "stringsource", line 67, in cfunc.to_py.__Pyx_CFunc_object____object___to_py.wrap File "src/lxml/objectify.pyx", line 1054, in lxml.objectify._checkInt File "src/lxml/objectify.pyx", line 1047, in lxml.objectify._checkNumber ValueError
However:
int("²²²²²²²²²²") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '²²²²²²²²²²'
Probably a bug in _checkNumber(): https://github.com/lxml/lxml/blob/d01872ccdf7e1e5e825b6c6292b43e7d27ae5fc4/s... Best regards, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
participants (3)
-
Holger.Joukl@LBBW.de
-
Marius Gedminas
-
Stefan Behnel