[lxml-dev] lxml 1.0.3 and lxml 1.1beta builds for various platforms?

Hey, Compare: http://cheeseshop.python.org/pypi/lxml/1.0.2 with http://cheeseshop.python.org/pypi/lxml/1.0.3 and we see that 1.0.2 has support for lots of different platforms, including the nice static windows build, but 1.0.3 has not. In part this is my fault, as it appears I need to do various linux eggs, but a couple of more egg donations from others would be appreciated! The same story applies to 1.1 beta. Regards, Martijn

Hi, Martijn Faassen wrote:
we see that 1.0.2 has support for lots of different platforms, including the nice static windows build, but 1.0.3 has not.
It's summer holiday time, I guess that's the reason. Since there was a crash bug in 1.0.3, I'll release a 1.0.4 soon, so it's not too much of a problem if eggs are missing for 1.0.3. But since I then really, /really/ hope that that'll finally be the last 1.0 release necessary, I'll be as happy as Martijn to see egg contributions. Stefan

Hi, depending on how one accesses objectified elements there can be differences in the resulting element type: >>> root = objectify.Element('root') >>> sub = objectify.Element('root') >>> root.sub = sub >>> root.sub.x = 1 >>> del root.sub.x >>> print root root = None [ObjectifiedElement] sub = '' [StringElement] This yields a StringElement root.sub because root.sub has no element contents, does have a parent element but not any children. Whereas >>> root = objectify.Element('root') >>> sub = objectify.Element('root') >>> root.sub = sub >>> root.sub.x = 1 >>> print root root = None [ObjectifiedElement] sub = None [ObjectifiedElement] x = 1 [IntElement] >>> del root.sub.x >>> print root root = None [ObjectifiedElement] sub = None [ObjectifiedElement] >>> yields an ObjectifiedElement root.sub because I already accessed root.sub before deleting its child x, thus making it an ObjectifiedElement in the etree node proxy because at that time it had children. I'm not sure how to address this problem. For my use case it is desirable for - empty content leaf elements to be StringElements, just like it is today: E.g. when parsing from xml s.th. like '<root><s/></root>' then s should be a StringElement (empty string, leaf node). Also when assigning an empty string in objectify this should end up in a StringElement: >>> root.s = '' >>> print root root = None [ObjectifiedElement] s = '' [StringElement] >>> - a "structural" element (this is what I use ObjectifiedElements for - they are supposed to potentially have children) to remain like it is even if its children get deleted The problem also manifests in this use case: >>> root = objectify.Element('root') >>> root.sub = objectify.Element('whatever') >>> print root root = None [ObjectifiedElement] sub = '' [StringElement] >>> where I would rather have root.sub to be an ObjectifiedElement. And I'm also the one to blame for the current behaviour because I proposed parts of the class lookup order to Stefan :-) Some thoughts: - maybe disallow DataElements to have children, i.e. disabling __setattr__ and alike for DataElements? Then ObjectifiedElements would need to have an accessible (string) pyvalue in contrast to current behaviour - maybe change the time an object is actually registered in the node proxy? - add an additional "structural" element class that is basically just an ObjectifiedElement but has an artificial pytype to make it retain its type and can be produced by a factory similar to objectify.DataElement? - just not care about a StringElement acting as a structural element as it can currently have children too (though it supports the string API parts on top of the ObjectifiedElement basic API)? Greetings, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
It's even worse:
This is pretty wrong. The thing that bothers me is that there should not actually be a permanent Python reference to root.sub, which would normally mean that the object should get recreated each time it is accessed. But as the last command shows, that is not the case.
That's only a problem if you access the Python reference of the child itself afterwards, which you normally wouldn't if it's a pure structural element.
Sure, but I'd figure that's a rare use case anyway. And if you need it, there are enough ways to get around it, from parsing to ObjectPath.
Not a good idea. In that case, things like this would potentially stop working:
Reason: as it stands now, root.sub would become a StringElement, which would not accept any children.
- maybe change the time an object is actually registered in the node proxy?
It's difficult to avoid instantiating element objects when setting and modifying content. The main reason is that if we don't have a proxy, we have to clean up the element ourselves, which means code duplication and/or a tighter code coupling between etree and objectify.
Hmm, we could potentially allow "ObjectifiedElement" as pytype, though I'd prefer waiting for a really good reason to do that.
That leads to the problem I pointed out at the top. What is your actual reasoning for requiring that empty leaf elements should be StringElements? I mean, you could always make them StringElements explicitly by setting
root.a.b.c.d = ''
and you can always explicitly access their String value with ".text". If we removed that special case, leaf elements that contain strings would always be StringElements and empty leaves and internal elements would always be ObjectifiedElements. That would not change the fact that elements keep their type as long as there is a Python reference to them, but it would work in a few more cases than it does now. Stefan

Hi Stefan, lxml-dev-bounces@codespeak.net schrieb am 05.09.2006 21:48:32:
normally there
are enough ways to get around it, from parsing to ObjectPath.
This was just another way to describe the behaviour you put out above. I want it to be an ObjectifiedElement because I know I'll put children in it later. proxy?
It's difficult to avoid instantiating element objects when setting and modifying content. The main reason is that if we don't have a proxy, we
have
I know, it's not nice. But right now I can't think of another way to force a leaf to be an ObjectifiedElement. parts on
Wouldn't this end up in d being an ObjectifiedElement if the logic (empty leaves are StringElements) changed? there
When parsing from XML I need '<root><s>some string</s></root>' to behave like '<root><s></s></root>'. For someone processing the data "s" should always act like a (possibly empty) string. Your solution would only work for me if ObjectifiedElement got a .pyval attribute, too, and its .text was not None but rather '' if no text content is in the node, and probably also needed the String API parts. Much of this stems from the fact the ElementTree elt.text returns None if there is no element text instead of '' (but I guess this won't change :-) Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Holger Joukl wrote:
No. The value is an empty string, not an empty value. So there is text content in there, it's just of length zero.
Much of this stems from the fact the ElementTree elt.text returns None if there is no element text instead of '' (but I guess this won't change :-)
It returns '' if the value is '' and it returns None if there is no value. That already changed to adapt to ET's own behaviour. The parser sees "<a/>" and "<a></a>" as not having a value. So you will never get an empty string back from a parsed tree. However, if you set it to '', lxml will continue to return an empty string and objectify will determine that it is a StringElement. Maybe you could get by with wrapper functions that add the '' for leafs where required? Stefan

Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 06.09.2006 11:27:43:
Hm, I could of course "stringify" all empty leaves after parsing, given that my users aren't accessing the etree/objectify APIs e.g. fromstring() directly. But I'd have to iterate over the whole tree for this. BTW: What do you think about adding .encode(...) to StringElement? Something we've discussed before: Would it make sense to allow an ObjectifiedElement instance to change its element.text internally, like e.g. in its _init() method? Or do you think it is better to stay explicit, loop over the tree and replace elements as needed? My use case is the DatetimeElement class I'm using where I will probably want to change the text to iso format datetime. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
That's a matter of documentation. The best way is to write a small Python wrapper around the objectify module and have them import /that/. from lxml.objectify import * from lxml import objectify def fromstring(xml): return _fixItUp( objectify.fromstring(xml) )
But I'd have to iterate over the whole tree for this.
Sure, it's much easier with the right class in place. And having all elements instantiated during iteration isn't quite the most efficient thing ever. You could reduce the effort with a smart XPath expression, though. One thing that comes to my mind is that we could add support for replacing the default type classes used by ObjectifyElementClassLookup. We could add keyword arguments so that you could say lookup = ObjectifyElementClassLookup(StringElement=MyStringElementClass) That would currently work for String-, None- and ObjectifiedElement only, as the others use the data type registry. Maybe we should rather support something like "default_data_class" and "default_tree_class" (and keep the NoneElement, which is only used in a well defined case anyway). Then again, what about "empty_class", "xsi_nil_class" and "tree_class"? Any preference or comments?
BTW: What do you think about adding .encode(...) to StringElement?
Python's string objects have 35 documented methods, most of which we could implement (although some of them, like "index" and "find" already have a different meaning in etree/objectify). If we consider implementing one, we should rather have all of them in place. Don't know if it's worth it. As the documentation says, if you want a real string, use ".text".
I think it would be a good idea to add a method "__setText(s)" to ObjectifiedDataElement. That would make it available to subclasses and at the same time make it clear that it is *no* public API. Stefan

Hi Stefan, lxml-dev-bounces@codespeak.net schrieb am 06.09.2006 22:28:39: the
I'm perfectly happy with the current solution except for setattr-ing a 'structural element' and wanting this to remain instead of becoming a StringElement. I don't quite see how a different default data class or different tree class achieve this? So I'm back to suggesting a TreeElement() factory (not the best name, maybe) returning an ObjectifiedElement with a new pytype='ObjectifiedElement' which keeps it from becoming a string. I think that's still nicer than "stringifying" every single empty leaf when parsing from XML. the
documentation says, if you want a real string, use ".text".
Guess you're right, let's keep things simple. I was worried about e.g. printing to stdout but then again someone should probably convert all data to unicode and then encode to his preferred encoding, as he'll have to deal with numbers and stuff anyway (which don't have an .encode method, either). the
same time make it clear that it is *no* public API.
+1 for that. If somebody wants a user to be able to modify in place he can then also add a public method to his data classes. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, I didn't wait for this to settle for 1.1.0, but it can become available in 1.1.1 if we see it fit. Holger Joukl wrote:
I chose "tree_class" and "empty_data_class" now. I think that's sufficiently telling.
Well, the idea is that you can change the default for empty data classes (remember that it's a pretty arbitrary decision to default to StringElement here) and also use subclasses of ObjectifiedElement for the tree structure. However, if you want StringElement in some cases and ObjectifiedElement in other cases, that's difficult to achieve at the Python level, as it would require passing information about the C node to allow taking the decision.
What about adding the attribute in objectify.Element()? You can't normally change the data value of an Element itself, so the only real reason why you would call objectify.Element() is to create a structural element (usually a root node). I called the corresponding pytype value "TREE" for now, I think it's unlikely that someone would use that as custom type name. Stefan

Hello Stefan, first of all congrats for bringing out the 1.1 release! We are currently working hard on basing our toolkit on lxml.objectify and so far it works like a charm. Especially the ObjectPath functionality and the ease of hooking custom element classes into the lookup mechanism is great. I really think anybody considering to use amara.bindery or gnosis.objectify should have a good look at lxml.objectify. Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 14.09.2006 19:29:07:
Not in a hurry :-)
I'm still not quite sure what you mean by that. In my words: That will allow for customizing the behaviour when encountering - an empty leaf node (empty_data_class gets chosen) - an empty tree node = a node that contains no text but has children (tree_class gets chosen) This is to not force an objectify user to follow our arbitrary (though with good reason ;-) decision to use StringElement for empty leaves. Right?
Great, I'll try it out. But I'm still voting for a TreeElement() factory as I'll have to write s.th. like this anyway: def TreeElement(): return objectify.Element('tree', {objectify.PYTYPE_ATTRIBUTE: 'TREE'}) And such a factory complemented the objectify module interface nicely, given there's also the DataElement function, imho. Btw. I've found that I rather often use ElementBase.__len__(<someObjectifiedObj>) to get the childcount/find out if an element has children. What do you think about adding a hasChildren() or countChildren() function to objectify? Regards, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

## forgot to send this to the list also ## Hi Holger, Holger Joukl (Holger.Joukl@LBBW.de) wrote:
Cool, that's good to know.
Especially the ObjectPath functionality and the ease of hooking custom element classes into the lookup mechanism is great.
:) Custom element classes were one of the first thing I implemented when I came to lxml. And I really like the way they now fit into the new lookup framework.
I really think anybody considering to use amara.bindery or gnosis.objectify should have a good look at lxml.objectify.
Hear, hear! :) Can we quote you on our web page? Like: "Google uses Python, but for critical stuff, my bank depends on lxml, because ..."
Right. There are a few more rules: has no parent -> tree has xsi:nil attribute -> NullElement has parsable type -> type class
For that, yes, and for easily replacing the inner tree classes. I think that can be a pretty helpful thing if you want to extend the API. You can now replace the type classes through PyType, and the inner tree class and the default leaf class through the lookup mechanism. I think that's all you might need to extend objectify.
No, that's redundant. Just use objectify.Element('tree') As I said, the main reason to call Element() is to create a tree element. If you want a data element, call DataElement(). So, no TreeElement() needed.
Interesting. I didn't realise there isn't really a way to find out about the children. They are in dir(), but only together with all methods etc. There's .getchildren(), but that builds all children, so it's less efficient than just counting them. A countchildren() method on ObjectifiedElement would match getchildren(), but it also adds another name that cannot be used to look up children. Maybe "countchildren" is a good one in that regard, though, as it's not really a good name for an XML tag. I'm -0 on haschildren(), though, as you can always call countchildren() if you expect the number to be small (note that this is about data binding, so very large documents are unlikely already) or use iterchildren().next() if you expect it to be really large. Child traversal is so fast that it shouldn't make too much of a difference if you count 100 children or only the first one. The method call overhead may even be the dominating factor here... I'll add a countchildren() method, then. Stefan

Hi, the type registration of LongElement in _registerPyTypes in objectify.pyx (line 833 in my not totally up-to-date 1.1 checkout) should be changed from pytype = PyType('long', None, LongElement) to pytype = PyType('long', long, LongElement) to activate its actual usefulness. Funny I never noticed that before. Cheers, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Holger Joukl wrote:
You didn't notice it because it is not used. As Python's int() accepts both ints and longs, there is no use in parsing values with both. It is only used for resolving the 'long' pytype hint, not for parsing longs. That's why the type check function is None. Stefan

Hi Stefan, lxml-dev-bounces@codespeak.net schrieb am 20.09.2006 16:51:20: the
type check function is None.
Yeah right. But if it was changed you'd be able to do
Still, you could of course achieve such a result by using DataElement. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi, find attached a little doc patch proposal for objectify.txt, laying out the data type determination in (even greater) detail, plus minor additions for the additional data class workings. I also suggest changing doc/mkhtml.py: 36,37c36,37 < command = ('%s --stylesheet=%s --link-stylesheet %s > %s' % < (script, stylesheet_url, source_path, dest_path)) ---
command = ('%s %s --stylesheet=%s --link-stylesheet %s > %s' % (sys.executable, script, stylesheet_url, source_path,
dest_path)) which allows for running things like PYTHON=/apps/pydev/gcc/3.4.4/bin/python2.4 make -e html with the non-standard interpreter path actually getting picked up for rest2html invocation. Holger (See attached file: objectify.txt.diff.txt) Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
I had to change a few passages, but mainly applied it as you wrote it. Please look through objectify.txt in the trunk (or 1.1 branch) to see if it fits your intention.
I applied that, too. Thanks for contributing. Stefan

Hi, Martijn Faassen wrote:
we see that 1.0.2 has support for lots of different platforms, including the nice static windows build, but 1.0.3 has not.
It's summer holiday time, I guess that's the reason. Since there was a crash bug in 1.0.3, I'll release a 1.0.4 soon, so it's not too much of a problem if eggs are missing for 1.0.3. But since I then really, /really/ hope that that'll finally be the last 1.0 release necessary, I'll be as happy as Martijn to see egg contributions. Stefan

Hi, depending on how one accesses objectified elements there can be differences in the resulting element type: >>> root = objectify.Element('root') >>> sub = objectify.Element('root') >>> root.sub = sub >>> root.sub.x = 1 >>> del root.sub.x >>> print root root = None [ObjectifiedElement] sub = '' [StringElement] This yields a StringElement root.sub because root.sub has no element contents, does have a parent element but not any children. Whereas >>> root = objectify.Element('root') >>> sub = objectify.Element('root') >>> root.sub = sub >>> root.sub.x = 1 >>> print root root = None [ObjectifiedElement] sub = None [ObjectifiedElement] x = 1 [IntElement] >>> del root.sub.x >>> print root root = None [ObjectifiedElement] sub = None [ObjectifiedElement] >>> yields an ObjectifiedElement root.sub because I already accessed root.sub before deleting its child x, thus making it an ObjectifiedElement in the etree node proxy because at that time it had children. I'm not sure how to address this problem. For my use case it is desirable for - empty content leaf elements to be StringElements, just like it is today: E.g. when parsing from xml s.th. like '<root><s/></root>' then s should be a StringElement (empty string, leaf node). Also when assigning an empty string in objectify this should end up in a StringElement: >>> root.s = '' >>> print root root = None [ObjectifiedElement] s = '' [StringElement] >>> - a "structural" element (this is what I use ObjectifiedElements for - they are supposed to potentially have children) to remain like it is even if its children get deleted The problem also manifests in this use case: >>> root = objectify.Element('root') >>> root.sub = objectify.Element('whatever') >>> print root root = None [ObjectifiedElement] sub = '' [StringElement] >>> where I would rather have root.sub to be an ObjectifiedElement. And I'm also the one to blame for the current behaviour because I proposed parts of the class lookup order to Stefan :-) Some thoughts: - maybe disallow DataElements to have children, i.e. disabling __setattr__ and alike for DataElements? Then ObjectifiedElements would need to have an accessible (string) pyvalue in contrast to current behaviour - maybe change the time an object is actually registered in the node proxy? - add an additional "structural" element class that is basically just an ObjectifiedElement but has an artificial pytype to make it retain its type and can be produced by a factory similar to objectify.DataElement? - just not care about a StringElement acting as a structural element as it can currently have children too (though it supports the string API parts on top of the ObjectifiedElement basic API)? Greetings, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
It's even worse:
This is pretty wrong. The thing that bothers me is that there should not actually be a permanent Python reference to root.sub, which would normally mean that the object should get recreated each time it is accessed. But as the last command shows, that is not the case.
That's only a problem if you access the Python reference of the child itself afterwards, which you normally wouldn't if it's a pure structural element.
Sure, but I'd figure that's a rare use case anyway. And if you need it, there are enough ways to get around it, from parsing to ObjectPath.
Not a good idea. In that case, things like this would potentially stop working:
Reason: as it stands now, root.sub would become a StringElement, which would not accept any children.
- maybe change the time an object is actually registered in the node proxy?
It's difficult to avoid instantiating element objects when setting and modifying content. The main reason is that if we don't have a proxy, we have to clean up the element ourselves, which means code duplication and/or a tighter code coupling between etree and objectify.
Hmm, we could potentially allow "ObjectifiedElement" as pytype, though I'd prefer waiting for a really good reason to do that.
That leads to the problem I pointed out at the top. What is your actual reasoning for requiring that empty leaf elements should be StringElements? I mean, you could always make them StringElements explicitly by setting
root.a.b.c.d = ''
and you can always explicitly access their String value with ".text". If we removed that special case, leaf elements that contain strings would always be StringElements and empty leaves and internal elements would always be ObjectifiedElements. That would not change the fact that elements keep their type as long as there is a Python reference to them, but it would work in a few more cases than it does now. Stefan

Hi Stefan, lxml-dev-bounces@codespeak.net schrieb am 05.09.2006 21:48:32:
normally there
are enough ways to get around it, from parsing to ObjectPath.
This was just another way to describe the behaviour you put out above. I want it to be an ObjectifiedElement because I know I'll put children in it later. proxy?
It's difficult to avoid instantiating element objects when setting and modifying content. The main reason is that if we don't have a proxy, we
have
I know, it's not nice. But right now I can't think of another way to force a leaf to be an ObjectifiedElement. parts on
Wouldn't this end up in d being an ObjectifiedElement if the logic (empty leaves are StringElements) changed? there
When parsing from XML I need '<root><s>some string</s></root>' to behave like '<root><s></s></root>'. For someone processing the data "s" should always act like a (possibly empty) string. Your solution would only work for me if ObjectifiedElement got a .pyval attribute, too, and its .text was not None but rather '' if no text content is in the node, and probably also needed the String API parts. Much of this stems from the fact the ElementTree elt.text returns None if there is no element text instead of '' (but I guess this won't change :-) Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Holger Joukl wrote:
No. The value is an empty string, not an empty value. So there is text content in there, it's just of length zero.
Much of this stems from the fact the ElementTree elt.text returns None if there is no element text instead of '' (but I guess this won't change :-)
It returns '' if the value is '' and it returns None if there is no value. That already changed to adapt to ET's own behaviour. The parser sees "<a/>" and "<a></a>" as not having a value. So you will never get an empty string back from a parsed tree. However, if you set it to '', lxml will continue to return an empty string and objectify will determine that it is a StringElement. Maybe you could get by with wrapper functions that add the '' for leafs where required? Stefan

Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 06.09.2006 11:27:43:
Hm, I could of course "stringify" all empty leaves after parsing, given that my users aren't accessing the etree/objectify APIs e.g. fromstring() directly. But I'd have to iterate over the whole tree for this. BTW: What do you think about adding .encode(...) to StringElement? Something we've discussed before: Would it make sense to allow an ObjectifiedElement instance to change its element.text internally, like e.g. in its _init() method? Or do you think it is better to stay explicit, loop over the tree and replace elements as needed? My use case is the DatetimeElement class I'm using where I will probably want to change the text to iso format datetime. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
That's a matter of documentation. The best way is to write a small Python wrapper around the objectify module and have them import /that/. from lxml.objectify import * from lxml import objectify def fromstring(xml): return _fixItUp( objectify.fromstring(xml) )
But I'd have to iterate over the whole tree for this.
Sure, it's much easier with the right class in place. And having all elements instantiated during iteration isn't quite the most efficient thing ever. You could reduce the effort with a smart XPath expression, though. One thing that comes to my mind is that we could add support for replacing the default type classes used by ObjectifyElementClassLookup. We could add keyword arguments so that you could say lookup = ObjectifyElementClassLookup(StringElement=MyStringElementClass) That would currently work for String-, None- and ObjectifiedElement only, as the others use the data type registry. Maybe we should rather support something like "default_data_class" and "default_tree_class" (and keep the NoneElement, which is only used in a well defined case anyway). Then again, what about "empty_class", "xsi_nil_class" and "tree_class"? Any preference or comments?
BTW: What do you think about adding .encode(...) to StringElement?
Python's string objects have 35 documented methods, most of which we could implement (although some of them, like "index" and "find" already have a different meaning in etree/objectify). If we consider implementing one, we should rather have all of them in place. Don't know if it's worth it. As the documentation says, if you want a real string, use ".text".
I think it would be a good idea to add a method "__setText(s)" to ObjectifiedDataElement. That would make it available to subclasses and at the same time make it clear that it is *no* public API. Stefan

Hi Stefan, lxml-dev-bounces@codespeak.net schrieb am 06.09.2006 22:28:39: the
I'm perfectly happy with the current solution except for setattr-ing a 'structural element' and wanting this to remain instead of becoming a StringElement. I don't quite see how a different default data class or different tree class achieve this? So I'm back to suggesting a TreeElement() factory (not the best name, maybe) returning an ObjectifiedElement with a new pytype='ObjectifiedElement' which keeps it from becoming a string. I think that's still nicer than "stringifying" every single empty leaf when parsing from XML. the
documentation says, if you want a real string, use ".text".
Guess you're right, let's keep things simple. I was worried about e.g. printing to stdout but then again someone should probably convert all data to unicode and then encode to his preferred encoding, as he'll have to deal with numbers and stuff anyway (which don't have an .encode method, either). the
same time make it clear that it is *no* public API.
+1 for that. If somebody wants a user to be able to modify in place he can then also add a public method to his data classes. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, I didn't wait for this to settle for 1.1.0, but it can become available in 1.1.1 if we see it fit. Holger Joukl wrote:
I chose "tree_class" and "empty_data_class" now. I think that's sufficiently telling.
Well, the idea is that you can change the default for empty data classes (remember that it's a pretty arbitrary decision to default to StringElement here) and also use subclasses of ObjectifiedElement for the tree structure. However, if you want StringElement in some cases and ObjectifiedElement in other cases, that's difficult to achieve at the Python level, as it would require passing information about the C node to allow taking the decision.
What about adding the attribute in objectify.Element()? You can't normally change the data value of an Element itself, so the only real reason why you would call objectify.Element() is to create a structural element (usually a root node). I called the corresponding pytype value "TREE" for now, I think it's unlikely that someone would use that as custom type name. Stefan

Hello Stefan, first of all congrats for bringing out the 1.1 release! We are currently working hard on basing our toolkit on lxml.objectify and so far it works like a charm. Especially the ObjectPath functionality and the ease of hooking custom element classes into the lookup mechanism is great. I really think anybody considering to use amara.bindery or gnosis.objectify should have a good look at lxml.objectify. Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 14.09.2006 19:29:07:
Not in a hurry :-)
I'm still not quite sure what you mean by that. In my words: That will allow for customizing the behaviour when encountering - an empty leaf node (empty_data_class gets chosen) - an empty tree node = a node that contains no text but has children (tree_class gets chosen) This is to not force an objectify user to follow our arbitrary (though with good reason ;-) decision to use StringElement for empty leaves. Right?
Great, I'll try it out. But I'm still voting for a TreeElement() factory as I'll have to write s.th. like this anyway: def TreeElement(): return objectify.Element('tree', {objectify.PYTYPE_ATTRIBUTE: 'TREE'}) And such a factory complemented the objectify module interface nicely, given there's also the DataElement function, imho. Btw. I've found that I rather often use ElementBase.__len__(<someObjectifiedObj>) to get the childcount/find out if an element has children. What do you think about adding a hasChildren() or countChildren() function to objectify? Regards, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

## forgot to send this to the list also ## Hi Holger, Holger Joukl (Holger.Joukl@LBBW.de) wrote:
Cool, that's good to know.
Especially the ObjectPath functionality and the ease of hooking custom element classes into the lookup mechanism is great.
:) Custom element classes were one of the first thing I implemented when I came to lxml. And I really like the way they now fit into the new lookup framework.
I really think anybody considering to use amara.bindery or gnosis.objectify should have a good look at lxml.objectify.
Hear, hear! :) Can we quote you on our web page? Like: "Google uses Python, but for critical stuff, my bank depends on lxml, because ..."
Right. There are a few more rules: has no parent -> tree has xsi:nil attribute -> NullElement has parsable type -> type class
For that, yes, and for easily replacing the inner tree classes. I think that can be a pretty helpful thing if you want to extend the API. You can now replace the type classes through PyType, and the inner tree class and the default leaf class through the lookup mechanism. I think that's all you might need to extend objectify.
No, that's redundant. Just use objectify.Element('tree') As I said, the main reason to call Element() is to create a tree element. If you want a data element, call DataElement(). So, no TreeElement() needed.
Interesting. I didn't realise there isn't really a way to find out about the children. They are in dir(), but only together with all methods etc. There's .getchildren(), but that builds all children, so it's less efficient than just counting them. A countchildren() method on ObjectifiedElement would match getchildren(), but it also adds another name that cannot be used to look up children. Maybe "countchildren" is a good one in that regard, though, as it's not really a good name for an XML tag. I'm -0 on haschildren(), though, as you can always call countchildren() if you expect the number to be small (note that this is about data binding, so very large documents are unlikely already) or use iterchildren().next() if you expect it to be really large. Child traversal is so fast that it shouldn't make too much of a difference if you count 100 children or only the first one. The method call overhead may even be the dominating factor here... I'll add a countchildren() method, then. Stefan

Hi, the type registration of LongElement in _registerPyTypes in objectify.pyx (line 833 in my not totally up-to-date 1.1 checkout) should be changed from pytype = PyType('long', None, LongElement) to pytype = PyType('long', long, LongElement) to activate its actual usefulness. Funny I never noticed that before. Cheers, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Holger Joukl wrote:
You didn't notice it because it is not used. As Python's int() accepts both ints and longs, there is no use in parsing values with both. It is only used for resolving the 'long' pytype hint, not for parsing longs. That's why the type check function is None. Stefan

Hi Stefan, lxml-dev-bounces@codespeak.net schrieb am 20.09.2006 16:51:20: the
type check function is None.
Yeah right. But if it was changed you'd be able to do
Still, you could of course achieve such a result by using DataElement. Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi, find attached a little doc patch proposal for objectify.txt, laying out the data type determination in (even greater) detail, plus minor additions for the additional data class workings. I also suggest changing doc/mkhtml.py: 36,37c36,37 < command = ('%s --stylesheet=%s --link-stylesheet %s > %s' % < (script, stylesheet_url, source_path, dest_path)) ---
command = ('%s %s --stylesheet=%s --link-stylesheet %s > %s' % (sys.executable, script, stylesheet_url, source_path,
dest_path)) which allows for running things like PYTHON=/apps/pydev/gcc/3.4.4/bin/python2.4 make -e html with the non-standard interpreter path actually getting picked up for rest2html invocation. Holger (See attached file: objectify.txt.diff.txt) Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi Holger, Holger Joukl wrote:
I had to change a few passages, but mainly applied it as you wrote it. Please look through objectify.txt in the trunk (or 1.1 branch) to see if it fits your intention.
I applied that, too. Thanks for contributing. Stefan
participants (3)
-
Holger Joukl
-
Martijn Faassen
-
Stefan Behnel