Mailman 3 December 2015 - lxml - The Python XML Toolkit

[lxml-dev] lxml has its page on launchpad
by Stefan Behnel 11 Apr '23

11 Apr '23

Hi all, I added the lxml project to launchpad, the Ubuntu Bug-Tracker. It also has a FAQ engine and a couple of other goodies. https://launchpad.net/lxml It's easy to sign up for launchpad, BTW, no 90%-footnotes-contract. Have fun, Stefan

9 9

[lxml-dev] Checking whether a node is a comment/element
by Geoffrey Sneddon 10 Apr '23

10 Apr '23

Hi, What's the best way to check whether a given node is a comment or an element? For the former, I'm currently using isinstance(node, etree._Comment), which is rather obviously sub-optimal. -- Geoffrey Sneddon <http://gsnedders.com/>

6 6

[lxml-dev] Reparenting a node
by Lawrence Oluyede 30 Jan '23

30 Jan '23

I have a doc A and a doc B, I'd like to put a node extracted from A in the document B but I always get a ValueError: ValueError: Element is not a child of this node. I didn't find any "setparent" in the API. How can I do this? -- Lawrence, oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair

3 2

[lxml-dev] lxml 2.0.5 released
by Stefan Behnel 11 Jan '23

11 Jan '23

Hi all, lxml 2.0.5 is on PyPI. This is a bug-fix-only release of the stable 2.0 series. Have fun, Stefan 2.0.5 (2008-05-01) Bugs fixed * Resolving to a filename in custom resolvers didn't work. * lxml did not honour libxslt's second error state "STOPPED", which let some XSLT errors pass silently. * Memory leak in Schematron with libxml2 >= 2.6.31.

3 4

[lxml-dev] Building LXML Trunk
by Sidnei da Silva 31 Aug '22

31 Aug '22

Hi, I've tried to build lxml from trunk today, on Win32. Got the following error: src\lxml\etree.c(880) : error C2059: syntax error : ')' src\lxml\etree.c(881) : error C2059: syntax error : ')' src\lxml\etree.c(882) : error C2059: syntax error : ')' src\lxml\etree.c(883) : error C2059: syntax error : ')' Any clue? Smells like a Pyrex issue? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

4 4

Building lxml statically for MeVisLab
by Corvus Corax 21 Jan '16

21 Jan '16

Hi, I'm trying to build lxml for MeVisLab 2.7 on Windows 7 x64. Therefore I'm using the Python 2.7 delivered with MeVisLab and MS Visual Studio 2013 Professional. As my workplace is behind a proxy I had to configure the setup.py and buildlibxml.py to use the already downloaded libraries. How I did this is described in the appended build-changes.txt. So when trying to build lxml it fails with the output in build.log. It looks like it's trying to rebuild lxml.etree, but does it even have to? Would you please help me with this? I don't know what to do from here on. Sincerely, Corbie

3 5

help
by balavignesh 10 Jan '16

10 Jan '16

I am trying to build lxml package from the following link https://pypi.python.org/packages/source/l/lxml/lxml-3.5.0.tar.gz#md5=9f0c5f… using pypy. *$ pypy setup.py build* I run this pypy in a virtualenv. I am using pypy 4.0.1 and lxml 3.5.0. The build fails with the following error: cc -O2 -fPIC -Wimplicit -I/usr/include/libxml2 -Isrc/lxml/includes -I/root/venv/pypy/include -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w src/lxml/lxml.etree.c: In function '__Pyx_call_return_trace_func': src/lxml/lxml.etree.c:4323:13: error: 'PyThreadState' has no member named 'tracing' tstate->tracing++; ^ src/lxml/lxml.etree.c:4324:13: error: 'PyThreadState' has no member named 'use_tracing' tstate->use_tracing = 0; ^ src/lxml/lxml.etree.c:4325:33: error: 'PyThreadState' has no member named 'c_tracefunc' if (CYTHON_TRACE && tstate->c_tracefunc) ^ src/lxml/lxml.etree.c:4326:17: error: 'PyThreadState' has no member named 'c_tracefunc' tstate->c_tracefunc(tstate->c_traceobj, frame, PyTrace_RETURN, result); ^ src/lxml/lxml.etree.c:4326:37: error: 'PyThreadState' has no member named 'c_traceobj' tstate->c_tracefunc(tstate->c_traceobj, frame, PyTrace_RETURN, result); ^ src/lxml/lxml.etree.c:4326:58: error: 'PyTrace_RETURN' undeclared (first use in this function) tstate->c_tracefunc(tstate->c_traceobj, frame, PyTrace_RETURN, result); ^ src/lxml/lxml.etree.c:4326:58: note: each undeclared identifier is reported only once for each function it appears in src/lxml/lxml.etree.c:4327:17: error: 'PyThreadState' has no member named 'c_profilefunc' if (tstate->c_profilefunc) ^ src/lxml/lxml.etree.c:4328:17: error: 'PyThreadState' has no member named 'c_profilefunc' tstate->c_profilefunc(tstate->c_profileobj, frame, PyTrace_RETURN, result); ^ src/lxml/lxml.etree.c:4328:39: error: 'PyThreadState' has no member named 'c_profileobj' tstate->c_profilefunc(tstate->c_profileobj, frame, PyTrace_RETURN, result); and this error goes on for thousands of lines. I tried googling this error and I came across this link : https://bitbucket.org/pypy/pypy/issues/1185/cpyext-missing-members-in-pythr… I am not sure if this is the same issue as mine. Any help would be appreciated. *Note*: This lxml package is successfully built using python 2.7, 3.2, 3.4. Thank you. -- Regards, *Balavignesh S*

2 1

Why is meta tag attributes removed by Cleaner?
by Misha Penkov 28 Dec '15

28 Dec '15

Hi, I'm trying to clean a HTML file that contains meta tags. I want the meta tags to be preserved as-is. Unfortunately, the cleaner removes everything except the "name" attribute of the tag. How can I prevent this behavior? Here is some example source: import lxml.html.clean html = """<html> <head> <meta name="keywords" content="test"> </head> </html>""" def clean_html(html): """Removes parts of HTML unnecessary for processing.""" kill_tags = ["map", "base", "iframe", "select", "noscript"] kwargs = {"scripts": True, "javascript": True, "comments": True, "style": True, "links": True, "meta": False, "page_structure": False, "processing_instructions": True, "embedded": True, "frames": False, "forms": False, "annoying_tags": True, "kill_tags": kill_tags, "whitelist_tags": ["meta"]} cleaner = lxml.html.clean.Cleaner(**kwargs) cleaned = cleaner.clean_html(unicode(html)) return cleaned print clean_html(html) On my system, I see this printed to standard output: <html> <head> <meta name="keywords"> </head> </html> How can I prevent the cleaner from removing the content attribute? Cheers, Michael

1 0

Relax NG Compact syntax support
by Dirkjan Ochtman 27 Dec '15

27 Dec '15

Hi all, I recently had a situation at work where I wanted to use the RELAX NG Compact syntax to write a schema for some XML configuration files we are using. Having happily worked with lxml before, I googled around to see if I could feed it RNC schemas somehow, and was disappointed to only find some mailing list threads about this not being possible. There was some Python code by David Mertz that implemented a small subset of RNC, but this was deemed to limited to be fit for integration with lxml. I've since spent quite some time looking at rnc2rng and polishing it. My fork is here: https://github.com/djc/rnc2rng It now supports almost all of the syntax surface according to the RNC spec, I've created a fairly comprehensive regression test set (with about 90% coverage right now), and it works successfully on some relatively large real-world schemas that I ran into. Would you be interested to collaborate on somehow providing support for rnc2rng in lxml? Cheers, Dirkjan

2 11

Fwd: Building wheels for Windows
by anatoly techtonik 13 Dec '15

13 Dec '15

Hi. Top voted issue: https://bugs.launchpad.net/lxml/+bugs?orderby=-heat&start=0 Looks like Makefile has a command to build wheels. Why they are not uploaded to PyPI? https://pypi.python.org/pypi/lxml/3.5.0 -- anatoly t. -- anatoly t.

1 0