Comments causing malloc/bus error/segmentation fault

Apologies if this isn't the right place to send this, but I'm not sure what to do with it. I put it on StackOverflow where it has been pretty much ignored (if anyone would prefer to answer it there). http://stackoverflow.com/questions/16459539/malloc-and-bus-error-in-python-w... I've been having lots of problems with comments crashing my interpreter in lxml. A simplest case that causes malloc is: ``` import lxml import lxml.builder import lxml.html print "Causing malloc" builder = lxml.builder.ElementMaker() el = builder.div() el.append(lxml.html.HtmlComment("foo")) ``` I've tested this on python2.6 and python3.2 (with just a change to the print statements). It causes segmentation faults on python3.2 A more full example that also demonstrates more of what I've been trying to do is: ``` import lxml import lxml.builder import lxml.html import lxml.etree print "Causing Bus Error" class HtmlElement(lxml.html.HtmlElement): pass class HtmlElementLookup(lxml.html.HtmlElementClassLookup): def lookup(self, node_type, document, namespace, name): if node_type == 'comment': return lxml.etree.Comment else: return HtmlElement parser = lxml.html.HTMLParser() parser.set_element_class_lookup(HtmlElementLookup()) _HTMLBuilder = lxml.builder.ElementMaker( makeelement=parser.makeelement, typemap={ int: lambda e, i: str(i)}) class HtmlBuilder(object): def __getattr__(self, key): return getattr(_HTMLBuilder, key.lower()) builder = HtmlBuilder() el = builder.div() el.append(lxml.etree.Comment("foo")) for i in el: print i ``` This causes a bus error in python2.6 and a segmentation fault in python3.2. I'm not really sure how to proceed with this. I'd either like to get round it somehow (I can quite imagine I'm using it incorrectly), or fix it somehow. Thanks Ed

Ed Singleton, 10.05.2013 13:22:
Apologies if this isn't the right place to send this, but I'm not sure what to do with it.
Perfect place to post it.
I've been having lots of problems with comments crashing my interpreter in lxml. A simplest case that causes malloc is:
``` import lxml import lxml.builder import lxml.html
print "Causing malloc" builder = lxml.builder.ElementMaker() el = builder.div() el.append(lxml.html.HtmlComment("foo")) ```
Thanks for this excellent test case. That truly is one of those "I can't believe no-one found this yet" kind of bugs. It's been there for over three years now. (Or maybe people found it and just didn't report it, so thanks for the report.) https://github.com/lxml/lxml/commit/46e3b2570b6e63cfcb2c8bfe11467466b93b81d1 I'll see that I can get out a bug-fix release soon. Stefan

On 10 May 2013, at 14:08, Stefan Behnel <stefan_ml@behnel.de> wrote:
Ed Singleton, 10.05.2013 13:22:
Thanks for this excellent test case. That truly is one of those "I can't believe no-one found this yet" kind of bugs. It's been there for over three years now. (Or maybe people found it and just didn't report it, so thanks for the report.)
https://github.com/lxml/lxml/commit/46e3b2570b6e63cfcb2c8bfe11467466b93b81d1
Thank you for fixing it so quickly. I've checked out the master branch, and can confirm that it no longer breaks. The other test case is still causing a bus error though. I assume it's an unrelated issue. I'm trying to create a html builder that returns a subclass of HtmlElement (which in my normal code has lots of additional methods). https://github.com/Singletoned/wiseguy/blob/master/wiseguy/html.py It seems that returning a subclass of HtmlElement in the HtmlElementLookup causes a BusError, whereas returning the HtmlElement doesn't. I'm quite confused as it's printing the HtmlComment that causes the error. An example is below. ``` import lxml, lxml.builder, lxml.html, lxml.etree print "Causing Bus Error" class HtmlElement(lxml.html.HtmlElement): pass class HtmlElementLookup(lxml.html.HtmlElementClassLookup): def lookup(self, node_type, document, namespace, name): if node_type == 'comment': return lxml.html.HtmlComment else: return lxml.html.HtmlElement # This works return HtmlElement() # This doesn't work parser = lxml.html.HTMLParser() parser.set_element_class_lookup(HtmlElementLookup()) _HTMLBuilder = lxml.builder.ElementMaker( makeelement=parser.makeelement, typemap={ int: lambda e, i: str(i)}) class HtmlBuilder(object): def __getattr__(self, key): return getattr(_HTMLBuilder, key.lower()) builder = HtmlBuilder() el = builder.div() el.append(lxml.html.HtmlComment("foo")) for i in el: print i ``` Thanks Ed

Ed Singleton, 10.05.2013 18:09:
The other test case is still causing a bus error though. I assume it's an unrelated issue.
I'm trying to create a html builder that returns a subclass of HtmlElement (which in my normal code has lots of additional methods). https://github.com/Singletoned/wiseguy/blob/master/wiseguy/html.py
It seems that returning a subclass of HtmlElement in the HtmlElementLookup causes a BusError, whereas returning the HtmlElement doesn't. I'm quite confused as it's printing the HtmlComment that causes the error.
An example is below.
``` import lxml, lxml.builder, lxml.html, lxml.etree
print "Causing Bus Error"
class HtmlElement(lxml.html.HtmlElement): pass
class HtmlElementLookup(lxml.html.HtmlElementClassLookup): def lookup(self, node_type, document, namespace, name): if node_type == 'comment': return lxml.html.HtmlComment else: return lxml.html.HtmlElement # This works return HtmlElement() # This doesn't work
Fixed as well. Thanks for insisting. :) https://github.com/lxml/lxml/commit/8867ff26975d2a3e2822b15a7f41c404c3521017 Stefan
participants (2)
-
Ed Singleton
-
Stefan Behnel