
Hi, thanks for investigating this. Kai Hillmann schrieb am 17.11.21 um 00:23:
I tried to get a little bit deeper and started to build lxml myself to play around with this possible issue.
I think I found out that this is an actual bug, but also how to fix it. I was able to do it in a way that all your tests of "make test" are green/ok as well as the simple test suite linked below in my previous mail which has been written to demonstrate this problem.
Currently I tested under linux (20.04 LTS, python2.7.18/python3.8.10) only, not Windows, not MacOS, maybe someone of you could verify the patch on this platforms?
Patch against LXML Master (v4.7.0a0/tag: lxml-4.7.0-pre - 982f8d5612925010a12a70748a077af846def6be): https://pastebin.com/raw/x0Zmb0Kn
Should I create a bug report for this within your launchpad tracker to get this patch merged (if acceptable) ?
A pull request (or patch) is usually ok. I'm not strict on requiring tickets for each change. A PR would be better, though, including the tests, since it would get us a free CI run on the changes.
What do you think about the way it has been fixed? I think the main problems here are the bytestring vs unicode string comparison regarding namespaces/prefixes/uris -- I'm not sure whether there are some more places where it needs to be fixed as well.
Yeah, right, I also don't think it's the right way to solve this since it looks more like a data cleanliness issue. Meaning: why is there a mix of byte strings and unicode strings in the first place? In Py3, we should always have unicode strings in our hands. There must be some incorrect data conversion happening somewhere. Stefan