from lxml import etree data = open("duqu.ioc","rb").read() root = etree.fromstring(data) print(root.nsmap) {'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://schemas.mandiant.com/2010/ioc', 'xsd': 'http://www.w3.org/2001/XMLSchema'} print(root[6].tag) {http://schemas.mandiant.com/2010/ioc}definition root.find('definition', root.nsmap) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 1448, in lxml.etree._Element.find (src/lxml/lxml.etree.c:51339) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py",
Hi all: I'm working in a parser for XML files and i found and behaviour in lxml-python both in linux and os x. When i try to use .find() or findall() functions, the problem seams to be when someone try to deal with tag declared inside a namespace without prefix. Here an example to reproduce the bug: Vengeance:ioc luisgf$ python3 Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. line 281, in find it = iterfind(elem, path, namespaces) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py", line 271, in iterfind selector = _build_path_iterator(path, namespaces) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py", line 234, in _build_path_iterator return _cache[(path, namespaces and tuple(sorted(namespaces.items())) or None)] TypeError: unorderable types: NoneType() < str()
-- -- Luis González Fernández https://www.luisgf.es PGP ID: C918B80F (DD6F BFC1 FC14 4C81 34F8 EA1E 6BCB C27F C918 B80F) Twitter: @luisgf_2001 / Jabber: luisgf@mijabber.es
Luis González Fernández schrieb am 26.12.2014 um 22:37:
I'm working in a parser for XML files and i found and behaviour in lxml-python both in linux and os x.
When i try to use .find() or findall() functions, the problem seams to be when someone try to deal with tag declared inside a namespace without prefix.
Here an example to reproduce the bug:
from lxml import etree data = open("duqu.ioc","rb").read() root = etree.fromstring(data) print(root.nsmap) {'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://schemas.mandiant.com/2010/ioc', 'xsd': 'http://www.w3.org/2001/XMLSchema'} print(root[6].tag) {http://schemas.mandiant.com/2010/ioc}definition root.find('definition', root.nsmap) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 1448, in lxml.etree._Element.find (src/lxml/lxml.etree.c:51339) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py",
Vengeance:ioc luisgf$ python3 Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. line 281, in find it = iterfind(elem, path, namespaces) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py", line 271, in iterfind selector = _build_path_iterator(path, namespaces) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py", line 234, in _build_path_iterator return _cache[(path, namespaces and tuple(sorted(namespaces.items())) or None)] TypeError: unorderable types: NoneType() < str()
Thanks for the report. That's a bug. "None" prefixes should be rejected with a visible ValueError. There is no reason why they should be accepted. (Note that your example above is contrived. You'd never pass an unknown prefix-namespace map in real code.) Stefan
Stefan Behnel schrieb am 26.12.2014 um 23:07:
Luis González Fernández schrieb am 26.12.2014 um 22:37:
I'm working in a parser for XML files and i found and behaviour in lxml-python both in linux and os x.
When i try to use .find() or findall() functions, the problem seams to be when someone try to deal with tag declared inside a namespace without prefix.
Here an example to reproduce the bug:
from lxml import etree data = open("duqu.ioc","rb").read() root = etree.fromstring(data) print(root.nsmap) {'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: 'http://schemas.mandiant.com/2010/ioc', 'xsd': 'http://www.w3.org/2001/XMLSchema'} print(root[6].tag) {http://schemas.mandiant.com/2010/ioc}definition root.find('definition', root.nsmap) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 1448, in lxml.etree._Element.find (src/lxml/lxml.etree.c:51339) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py",
Vengeance:ioc luisgf$ python3 Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. line 281, in find it = iterfind(elem, path, namespaces) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py", line 271, in iterfind selector = _build_path_iterator(path, namespaces) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/lxml/_elementpath.py", line 234, in _build_path_iterator return _cache[(path, namespaces and tuple(sorted(namespaces.items())) or None)] TypeError: unorderable types: NoneType() < str()
Thanks for the report. That's a bug.
https://github.com/lxml/lxml/commit/91dcd48b656147120bf1f7955ba151a1a59eb9b4 Stefan
participants (2)
-
Luis González Fernández
-
Stefan Behnel