Hi LXML mailing list,
I'm working on a metadata validator and we're using LXML to do the XML wrestling. I have a few questions about the URI resolvers brewing, but I'm gathering some clear evidence for the questions I would like to ask about that. In the meantime, while I have been gathering said evidence, I've come across something strange related to the lxml.etree.Resolver.resolve_string method. I wonder if it's a bug, and it whether it's related to Python 2 vs Python 3.
If I have an arbitrary, but valid, XML file at a local path 'file:///something-or-other/test.xml' I can attempt to pass this through the resolver system by performing the following:
from lxml import etree from urllib3.util.url import parse_url
def resolve(self, url, id, context): # get the correct part of the URL url_components = parse_url(url)
# get contents of the XML file with open(url_components.path, mode='rb') as f: contents = f.read()
# pass this byte object to the resolve_string method return etree.Resolver.resolve_string(content, id, context, base_url=url)
Traceback (most recent call last): File "/home/XXXXX/.p2/pool/plugins/org.python.pydev.core_184.108.40.206805051638/pysrc/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec exec(exp, global_vars, local_vars) File "<console>", line 1, in <module> File "src/lxml/docloader.pxi", line 61, in lxml.etree.Resolver.resolve_string (src/lxml/lxml.etree.c:97816) TypeError: argument must be a byte string or unicode string
I get the same error with the alteration of the file open parameters:
with open(url_components.path, mode='rt', encoding='utf_8') as f:
contents = f.read()
I've checked that the contents variable is what I expect it to be and tried to encode() and decode() against the contents string, but no avail!
I've even tried: etree.Resolver.resolve_string(b'<xml></xml>', id, context, base_url=url) ...and... etree.Resolver.resolve_string(u'<xml></xml>', id, context, base_url=url)
...with the same result.
Any ideas? Surely, either or both of the byte string or the UTF-8 encoded Python 3 string should work!? Am I missing something obvious?
I'm using Python 3.4 and LXML 3.8.0.
All the best,
________________________________ This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system. ________________________________