Mailman 3 February 2010 - lxml - The Python XML Toolkit

[lxml-dev] lxml has its page on launchpad
by Stefan Behnel 11 Apr '23

11 Apr '23

Hi all, I added the lxml project to launchpad, the Ubuntu Bug-Tracker. It also has a FAQ engine and a couple of other goodies. https://launchpad.net/lxml It's easy to sign up for launchpad, BTW, no 90%-footnotes-contract. Have fun, Stefan

9 9

[lxml-dev] Checking whether a node is a comment/element
by Geoffrey Sneddon 10 Apr '23

10 Apr '23

Hi, What's the best way to check whether a given node is a comment or an element? For the former, I'm currently using isinstance(node, etree._Comment), which is rather obviously sub-optimal. -- Geoffrey Sneddon <http://gsnedders.com/>

6 6

[lxml-dev] Reparenting a node
by Lawrence Oluyede 30 Jan '23

30 Jan '23

I have a doc A and a doc B, I'd like to put a node extracted from A in the document B but I always get a ValueError: ValueError: Element is not a child of this node. I didn't find any "setparent" in the API. How can I do this? -- Lawrence, oluyede.org - neropercaso.it "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair

3 2

[lxml-dev] lxml 2.0.5 released
by Stefan Behnel 11 Jan '23

11 Jan '23

Hi all, lxml 2.0.5 is on PyPI. This is a bug-fix-only release of the stable 2.0 series. Have fun, Stefan 2.0.5 (2008-05-01) Bugs fixed * Resolving to a filename in custom resolvers didn't work. * lxml did not honour libxslt's second error state "STOPPED", which let some XSLT errors pass silently. * Memory leak in Schematron with libxml2 >= 2.6.31.

3 4

[lxml-dev] Building LXML Trunk
by Sidnei da Silva 31 Aug '22

31 Aug '22

Hi, I've tried to build lxml from trunk today, on Win32. Got the following error: src\lxml\etree.c(880) : error C2059: syntax error : ')' src\lxml\etree.c(881) : error C2059: syntax error : ')' src\lxml\etree.c(882) : error C2059: syntax error : ')' src\lxml\etree.c(883) : error C2059: syntax error : ')' Any clue? Smells like a Pyrex issue? -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

4 4

[lxml-dev] confusing xpath performance characteristics
by jholg＠gmx.de 24 Aug '12

24 Aug '12

Hi, I ran into some performance characteristics of lxml/libxml2 xpath that I find rather confusing: I try to find the @type attribute of a certain element in an XML Schema (which contains lots of complexType definitions with lots of elements in them; unfortunately I can't post the schema): >>> timeit.Timer(stmt="""xpath(schema)""", setup="""from lxml import etree, objectify; schema=etree.parse('NDM.xsd').getroot(); xpath = etree.XPath('//xs:element[@name="equity"]/@type', namespaces={'xs': 'http://www.w3.org/2001/XMLSchema'})""").repeat(number=10) [0.095885038375854492, 0.096823930740356445, 0.096174955368041992] So I think I'm being smart and give a little more path information - reckoning that this should *improve* performance: >>> timeit.Timer(stmt="""xpath(schema)""", setup="""from lxml import etree, objectify; schema=etree.parse('NDM.xsd').getroot(); xpath = etree.XPath('/xs:schema//xs:element[@name="equity"]/@type', namespaces={'xs': 'http://www.w3.org/2001/XMLSchema'})""").repeat(number=10) [0.1770780086517334, 0.1775970458984375, 0.17748594284057617] Hm. Performance degrades slightly. I'm adding even more of the path to where my desired elements live in the schema: >>> timeit.Timer(stmt="""xpath(schema)""", setup="""from lxml import etree, objectify; schema=etree.parse('xsd/NDM.xsd').getroot(); xpath = etree.XPath('/xs:schema/xs:complexType//xs:element[@name="equity"]/@type', namespaces={'xs': 'http://www.w3.org/2001/XMLSchema'})""").repeat(number=10) [103.79744100570679, 103.83671712875366, 103.61817717552185] What??? >>> timeit.Timer(stmt="""xpath(schema)""", setup="""from lxml import etree, objectify; schema=etree.parse('/ae/data/pydev/hjoukl/NDM/SVN_CO/TRUNK/ndm/reference/xsd/NDM.xsd').getroot(); xpath = etree.XPath('/xs:schema/xs:complexType/*/xs:element[@name="equity"]/@type', namespaces={'xs': 'http://www.w3.org/2001/XMLSchema'})""").repeat(number=10) [0.044407129287719727, 0.044126987457275391, 0.044229030609130859] >>> Ok, this version's better than my naive approach, which seems logical to me. But why would '/xs:schema/xs:complexType//xs:element[@name="equity"]/@type' perform drastically slower than '/xs:schema/xs:complexType//xs:element[@name="equity"]/@type' ? libxml2 problem? Running the same xpaths in Oxygen I don't notice performance differences (can't profile this). Holger -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

2 5

Re: [lxml-dev] [Bug 488222] Feature request: add better schematron support to lxml
by jholg＠gmx.de 03 Mar '10

03 Mar '10

> > This speaks for pulling the result accessor into the Schematron class, > probably as a class attribute that can be overridden on an instance level. > > > > The same might make sense for the iso-schematron implementation xsl > transformation steps. > > Sounds like a much better interface. Any interesting global options would > be better overridden by subtyping the validator class, so class attributes > make sense to me. Committed to trunk: https://codespeak.net/viewvc/?view=rev&revision=71090 This simply exposes the skeleton xslt steps and the validation result xpath as class attributes. I consider the iso-schematron works pretty much finished for now... Holger -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

2 2

[lxml-dev] Architecture/best practice question.
by Emanuele D'Arrigo 03 Mar '10

03 Mar '10

Hi everybody, a bit of a general architecture/best practice question. Say you want to keep in sync an ElementTree with a separate tree structure, one that is parallel but does not have the exact same nodes and yet needs to be informed and updated whenever a change in the ElementTree occurs. ElementTree supports custom elements and I guess it wouldn't be too difficult to override the standard methods of an element to do something before or after any change. -However-, I understand that ElementProxies cannot store instance-level data as the instances are not persistent and are garbage collected more or less as soon as they are no longer referenced somewhere. So, what I'm wondering is, how do I tell a method of a custom element what object in the parallel structure to inform whenever a change arises? I guess one way would be to store at -class level- (or where else?) a dictionary mapping custom ElementProxies instances to nodes of the parallel structure. In so doing whenever a custom method is executed it can get hold of the parallel structure. Is that a reasonable way to do it or are there better ones? Manu

3 3

[lxml-dev] lxml 2.2.5 released
by Stefan Behnel 28 Feb '10

28 Feb '10

Hi, I just released lxml 2.2.5 to PyPI. http://pypi.python.org/pypi/lxml/2.2.5 This is a bug fix release for the stable 2.2 series. It fixes three crash bugs in XPath, XSLT and lxml.objectify that occurred on certain operations. Updating is generally recommended, but not required if these did not affect your code so far. Stefan 2.2.5 (2010-02-28) Features added * Support for running XSLT extension elements on the input root node (e.g. in a template matching on "/"). Bugs fixed * Crash in XPath evaluation when reading smart strings from a document other than the original context document. * Support recent versions of html5lib by not requiring its XHTMLParser in htmlparser.py anymore. * Manually instantiating the custom element classes in lxml.objectify could crash. * Invalid XML text characters were not rejected by the API when they appeared in unicode strings directly after non-ASCII characters. * lxml.html.open_http_urllib() did not work in Python 3. * The functions strip_tags() and strip_elements() in lxml.etree did not remove all occurrences of a tag in all cases. * Crash in XSLT extension elements when the XSLT context node is not an element.

1 1

Re: [lxml-dev] Problem writing special HTML characters &#...
by Roberto Brunelli 26 Feb '10

26 Feb '10

Jens, thanks for the hints, but I still do not understand how to solve the problem I have. Just a couple of steps to better show it: RSSroot = etree.Element('rss') etree.SubElement(RSSroot, 'title').text = '& # 200;' # space between & # added here just to make sure the actual chars are shown print etree.tostring(RSSroot) and I get <rss><title>&#200;</title></rss> so the '&' turns out to be sanitized, while I wanted the special charcater È to go along ... Roberto On Fri, Feb 26, 2010 at 10:39 AM, Jens Quade <jq(a)qdevelop.de> wrote: > > On 26.02.2010, at 09:08, roby.brunelli(a)gmail.com wrote: > >> I'm trying to write an RSS file (extracting information from an html page) using >> >> etree.ElementTree(..).write(..) >> >> When I create the description part of a news I insert text with special characters such as: >> >> È >> >> and when I print (or write to file) the corresponding element, I get >> >> &#200 >> >> which I do not want (I want the original special char): is there a way to prevent this kind of mapping?? > >>>> from lxml import etree > >>>> x = etree.XML('<test>ü</test>') >>>> etree.ElementTree(x).write(sys.stdout) > <test>ü</test> > >>>> etree.ElementTree(x).write(sys.stdout, encoding='utf-8') > <test>ü</test> > > also: > >>>> print etree.tostring(x,encoding='utf-8') > <test>ü</test> > > > default encoding is ascii. > >

2 1