[lxml-dev] pickling ElementStringResult
Hi there, I just ran into a problem when upgrading an application to use lxml 2.0.x instead of 1.3.6. xpath now returns a special lxml.etree._ElementStringResult object, a smart string. A bit too smart for my situation... Previously, I'd get a string back and I could stuff that into the ZODB just fine - it was picklable. Now I get this and it can't be pickled, so the same code fails. Is it only xpath that is so affected or are there other operations where these things can be returned? It's quite frustrating to have to worry whether a string came from an lxml xpath (and possibly operations?) and therefore is broken in a subtle way when you want to store them somewhere. You always worry whether your test coverage is complete enough to detect all cases. It's also relatively hard to track this kind of bug down even if you have good test coverage, as the error only occurs later on when the ZODB tries to pickle the object with this kind of special string somewhere in it. Another potential problem is that a smart string when held in an object database might keep the underlying XML document alive long past its intended lifetime. I was originally thinking about coming up with a special way to pickle these smart strings, but this actually leads me to prefer another solution: a flag that can be passed to xpath() that turns off the returning of smart strings at all. That would fit my use case quite well, though I'd have to remember to use the flag all the time. It might be nice if the flag could instead be passed to the parser, but I'm not sure whether that is implementable. Regards, Martijn
Hi, Martijn Faassen wrote:
Previously, I'd get a string back and I could stuff that into the ZODB just fine - it was picklable. Now I get this and it can't be pickled, so the same code fails. Is it only xpath that is so affected or are there other operations where these things can be returned?
No, only for XPath results (wherever they occur).
Another potential problem is that a smart string when held in an object database might keep the underlying XML document alive long past its intended lifetime.
I was originally thinking about coming up with a special way to pickle these smart strings, but this actually leads me to prefer another solution: a flag that can be passed to xpath() that turns off the returning of smart strings at all.
Wouldn't it be enough to pickle the string subclass as a plain (unicode) string? You would obviously loose information that way, but pickling the string result together with the entire tree would be much more surprising IMHO. Stefan
Hi there, On Fri, Jun 27, 2008 at 7:50 AM, Stefan Behnel <stefan_ml@behnel.de> wrote: [snip]
Wouldn't it be enough to pickle the string subclass as a plain (unicode) string? You would obviously loose information that way, but pickling the string result together with the entire tree would be much more surprising IMHO.
Sorry for being unclear, I'm not suggesting that the entire tree should be pickled. The ZODB has a cache, which is simply some of the "recently touched" Python objects in memory. I have no idea how long the object in question will remain in the ZODB cache (i.e. just a normal Python object in memory). It could be there for hours, days, depending on activity and cache size, etc. The smart string keeps the document it was in awake, possibly way past the expected time, and the document will only be collected if the object is removed from the ZODB cache, which I can't predict very well. Only when the smart string is pickled would the reference with the document be broken. At least, I *think* this is how it works. So, while pickling happens immediately, the object doesn't disappear right away after pickling, keeping this reference alive. In think in general smart strings behave somewhat unexpectedly in the face of potentially long-running processes. One is inclined to treat them as strings, but their memory behavior is quite different. Regards, Martijn
Hi, Martijn Faassen wrote:
The ZODB has a cache, which is simply some of the "recently touched" Python objects in memory.
Why would an XPath string result end up in that cache in the first place?
In think in general smart strings behave somewhat unexpectedly in the face of potentially long-running processes. One is inclined to treat them as strings, but their memory behavior is quite different.
True. But the only way I see that would work around this internally is a weak reference - and Elements are not currently weak referencible. I never tried, but I would imagine that there is an overhead involved in adding a "__weakref__" to the _Element class. IIRC, this adds a dictionary to the class. I could also imagine giving the smart strings a method ".toplainstring()" that would return a plain string value without the parent link. That way, users who want to pass on the string to a potentially long-living place can unlink the string from its parent. Your proposal of configuring this behaviour on a parser (XML parser, not XPath parser) isn't impossible either, since we already pass a _Document (with a parser reference) into the XPath value unpacker. But I'm not convinced that that is the right place for such an option. Doing that in the XPath class looks harder at first sight. Stefan
participants (2)
-
Martijn Faassen
-
Stefan Behnel