
Hi, Dmitri Fedoruk wrote:
I got a problem, and, by binary search (based on change log) :) I found it in 2.0.5 first - it is the local file DTD resolver.
I'll take a look.
This issue originates in http://article.gmane.org/gmane.comp.python.lxml.devel/3499
Eventually I have to load DTD in some specific cases for parsing. Even if I load it from local disc and cache it, the parsing time is longer up to 10 times (40ms instead of 4ms).
So, I came up to the following (ugly) solution:
class LocalDTDResolver(etree.Resolver): def __init__(self, conf): self.conf = conf self.cached = None def resolve(self, url, id, context): if not self.cached: self.cached = self.resolve_filename( self.conf + '/vxml.dtd' , context ) return self.cached
Not that ugly, but not very helpful either. You are caching the filename, not the content. Check docloader.pxi to see how simple the machinery is here. There isn't currently a way to return a parsed document from a resolver (and I don't think libxml2 supports that), so I think the best you can do is to return the content as a cached string, thus avoiding I/O but not the parse overhead.
Systems are FreeBSD 6.2/7.0, lxml.etree: (2, 0, 5, 0) libxml used: (2, 6, 30) libxml compiled: (2, 6, 30) libxslt used: (1, 1, 22) libxslt compiled: (1, 1, 22)
This code is run within mod_python3/apache2.2.8
Now that you mention it: are you using the single interpreter option in mod_python or does it work without? I fixed a couple of threading things in 2.0.6, so that should now work without that work-around. But it's still untested due to lack of feedback.
Up to 2.0.5 I have no problem when the resolvingParser is called. But since 2.0.5 after I have this: # no call of resolving parser [root@machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles kern.openfiles: 377 # after a single (!) call of resolving parser [root@machine ~/trunk/fb-ports/py-lxml]$ sysctl kern.openfiles kern.openfiles: 11439
If you are really using the above code then it means that libxml2 is reading the DTD internally. Maybe there's something more we have to clean up, or maybe it's really a leak in libxml2. But the numbers you post here look very unrealistic to me.
And my local DTD file is opened about 11000 times (according to fstat and find -inode).
If you parse it once, libxml2 should open the DTD file once, and not more. I'll look into that. Stefan