[lxml-dev] objectify factories

Hi Holger, I finally looked a little closer at the latest patches you sent me. They contain good ideas, but I'm still not very comfortable with the implementation. I think the PT() factory should be folded into DataElement(), as it's just a special case. I attached a patch that merges a part of your latest factory patch and removes the need for the PT factory. Please check if it does what you wanted and if anything is still missing. Regarding the TypedElementMaker, I think that if we write one that is adapted to objectify, we should not stop half-way. We should remove the "typemap" thing and just use the type inference mechanisms that objectify already provides. You can take a look into that, if you want, otherwise I will try to come up with an implementation when I find the time (which may be after the release of 2.0alpha1). Holger, is this ok for you or is there any reason we should not go this way? Stefan

Stefan Behnel wrote:
... or a bit before :) This code is what I think might work well for objectify. Any general comments before I go any further? Stefan cdef class ElementMaker: cdef object _makeelement cdef object _namespace cdef object _nsmap def __init__(self, namespace=None, nsmap=None, makeelement=None): self._nsmap = nsmap if namespace is None: self._namespace = None else: self._namespace = "{%s}" % namespace if makeelement is not None: assert callable(makeelement) self._makeelement = makeelement else: self._makeelement = None def __getattr__(self, tag): if tag[0] != "{" and self._namespace is not None: tag = self._namespace + tag return _ObjectifyElementMakerCaller( self._makeelement, tag, self._nsmap) cdef class _ObjectifyElementMakerCaller: cdef object _tag cdef object _nsmap cdef object _element_factory def __init__(self, element_factory, tag, nsmap): self._element_factory = element_factory self._tag = tag self._nsmap = nsmap def __call__(self, *children, **attrib): cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element if self._element_factory is None: element = cetree.makeElement( self._tag, None, objectify_parser, None, None, attrib, self._nsmap) else: element = self._element_factory(self._tag, attrib, self._nsmap) for child in children: if child is None: if len(children) == 1: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif python._isString(child): _add_text(element, child) elif isinstance(child, _Element): cetree.appendChild(element, child) elif isinstance(child, _ObjectifyElementMakerCaller): elementMaker = <_ObjectifyElementMakerCaller>child if elementMaker._element_factory is None: child = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: child = elementMaker._element_factory( (<_ObjectifyElementMakerCaller>child)._tag) cetree.appendChild(element, child) else: pytype = python.PyDict_GetItem( _PYTYPE_DICT, _typename(child)) if pytype is not NULL: (<PyType>pytype)._stringify(element, child) else: child = str(child) _add_text(element, child) return element

Hi,
I finally looked at the new ElementMaker implementation, and it works just fine for me. Attached patch adds tests for it (essentially the very same I already had for the initial implementation), plus some small tests that cover DataElement() "none" vs "NoneType" name compatibility measures. However, this ElementMaker does not add type annotation, as the TypedElementMaker I proposed at some point did. Question: Do we want/need a TypedElementMaker? I'd say yes, otherwise the E-Factory isn't very useful for someone who wants "strong typing". Also, I think it might make sense to have a PT() factory after all, just to add the possibility to hand in attributes:
What do you say? Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Meaning an additional TypedElementMaker, right? I think it is actually nice to have the not-annotating ElementMaker as a choice.
Yes it does but not the same way. DataElement() always uses the literal of the RVAL to infer the type, if not explicitly given, i.e. it does not make use of the python type of an RVAL. PT() otoh does work just like the new __setattr__/_setElementValue. Maybe modify DataElement() instead of introducing PT(), then? Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

But it does not add the type annotation attribute to an element, so:
although "3" is not an integer. Much the same what we discussed regarding DataElement() / PT() behaviour.
I'm about to change DataElement to use the new _setElementValue semantics. Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi,
Patch attached: - makes DataElement() behaviour consistent with new _setElementValue workings, using _typename(<value>) as pytype name, if not explicitly given - makes objectify ElementMaker add pytype annotation - fixes _setElementValue() to safely delAttributeFromNsName() for unregistered types - adds some tests for that stuff It does work except for these doctests for me: TESTED VERSION: 2.0.alpha1-46212 Python: (2, 4, 4, 'final', 0) lxml.etree: (2, 0, -199, 46212) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 758/758 (100.0%): Doctest: xpathxslt.txt ====================================================================== FAIL: Doctest: element_classes.txt ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod//lib/python2.4/unittest.py", line 260, in run testMethod() File "/apps/prod//lib/python2.4/doctest.py", line 2157, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for element_classes.txt File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/element_classes.txt", line 0 ---------------------------------------------------------------------- File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/element_classes.txt", line 347, in element_classes.txt Failed example: print honk_element[0].honking Expected: Traceback (most recent call last): ... AttributeError: 'lxml.etree._Element' object has no attribute 'honking' Got: Traceback (most recent call last): File "/apps/prod//lib/python2.4/doctest.py", line 1248, in __run compileflags, 1) in test.globs File "<doctest element_classes.txt[66]>", line 1, in ? print honk_element[0].honking AttributeError: 'etree._Element' object has no attribute 'honking' ====================================================================== FAIL: Doctest: objectify.txt ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod//lib/python2.4/unittest.py", line 260, in run testMethod() File "/apps/prod//lib/python2.4/doctest.py", line 2157, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for objectify.txt File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/objectify.txt", line 0 ---------------------------------------------------------------------- File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/objectify.txt", line 283, in objectify.txt Failed example: print etree.tostring(root, pretty_print=True) Expected: <root> <a>5</a> <b>6.1</b> <c>true</c> <d tell="me">how</d> </root> Got: <root xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="TREE"> <a xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="int">5</a> <b ns0:pytype="float">6.1</b> <c ns0:pytype="bool">true</c> <d tell="me" ns0:pytype="str">how</d> </root> ---------------------------------------------------------------------- File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/objectify.txt", line 302, in objectify.txt Failed example: print etree.tostring(root, pretty_print=True) Expected: <root> <title>The title</title> <type>5</type> </root> Got: <root xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="TREE"> <title xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="str">The title</title> <type ns0:pytype="int">5</type> </root> ---------------------------------------------------------------------- Ran 758 tests in 2.762s FAILED (failures=2) make: *** [test_inplace] Error 1 The first one seems to fail because of some change in element naming. The others result from the now-added type information of the ElementMaker but I left them in because I'm not sure why the nsmap-"aggregation" does not work here (ElementMaker uses _DEFAULT_NSMAP) as expected. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hi Holger, the trunk now builds with Cython instead of Pyrex, so please install it to get rid of the one failing doctest. (the reason the test fails is that Cython knows about the package you specify in distutils, Pyrex ignores it). http://www.cython.org/ lxml requires Cython 0.9.6.5. jholg@gmx.de wrote:
Patch attached: - makes DataElement() behaviour consistent with new _setElementValue workings, using _typename(<value>) as pytype name, if not explicitly given
ok.
- makes objectify ElementMaker add pytype annotation
ok. I modified the patch, mainly for performance reasons.
- fixes _setElementValue() to safely delAttributeFromNsName() for unregistered types
Ah, thanks for testing :)
- adds some tests for that stuff
Great, thanks.
:) _DEFAULT_NSMAP is only set *after* the instantiation of the E factory, so its None value is copied into the instance. Stefan

Hi,
Lazy me, not like you hadn't announced that quite a while ago. However, unfortunately: Just downloaded cython and tried to build lxml: 0 lb54320@adevp02 .../lxml $ /apps/pydev/bin/python2.4 setup.py build Traceback (most recent call last): File "setup.py", line 28, in ? import setupinfo File "/data/pydev/hjoukl/LXML/lxml/setupinfo.py", line 5, in ? from Cython.Distutils import build_ext as build_pyx [...] File "/apps/pydev/lib/python2.4/site-packages/Cython/Compiler/TypeSlots.py", line 88 full_args = "O" + self.fixed_arg_format if self.has_dummy_arg else self.fixed_arg_format ^ SyntaxError: invalid syntax Seems like cython relies on Python2.5 syntax, which renders it unusable for me. Any chance to remove the hard 2.5-syntax-dependency? At a quick glance this seems to be the only place where conditional expressions turn up. Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, back from the ever-too-short holidays...
I'm having a bit trouble to apply the patch as some seems to already be in the trunk now,so maybe I'm mistaken, but: How does this remove the need for PT(), which uses the python type name of its argument as pytype? Wouldn't folding this into DataElement() change DataElement behaviour significantly, which currently just operates on the string literal type-lookup?
I saw you already posted an implementation, so I'll have a look at that. Cheers, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi Holger, jholg@gmx.de wrote:
back from the ever-too-short holidays...
welcome back to work. :)
Sorry, it already is in the trunk.
Ok, I was mistaken. I just applied the bit that special cased ObjectifyDataElements that were passed into DataElement() ... and even that might need some rework. In the case where you pass an ODE *and* a _pytype, you'd have to convert the value to a string and process it with the normal machinery. You're right, PT() solves a different purpose. I think it makes sense to add it. (I know, I keep changing my mind here, but it really looks like a helpful little factory).
Thanks. Stefan

welcome back to work. :) Thanks!
Maybe we don't need it after all if general behaviour changes to auto-add python type name of RVALs. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Stefan Behnel wrote:
... or a bit before :) This code is what I think might work well for objectify. Any general comments before I go any further? Stefan cdef class ElementMaker: cdef object _makeelement cdef object _namespace cdef object _nsmap def __init__(self, namespace=None, nsmap=None, makeelement=None): self._nsmap = nsmap if namespace is None: self._namespace = None else: self._namespace = "{%s}" % namespace if makeelement is not None: assert callable(makeelement) self._makeelement = makeelement else: self._makeelement = None def __getattr__(self, tag): if tag[0] != "{" and self._namespace is not None: tag = self._namespace + tag return _ObjectifyElementMakerCaller( self._makeelement, tag, self._nsmap) cdef class _ObjectifyElementMakerCaller: cdef object _tag cdef object _nsmap cdef object _element_factory def __init__(self, element_factory, tag, nsmap): self._element_factory = element_factory self._tag = tag self._nsmap = nsmap def __call__(self, *children, **attrib): cdef _ObjectifyElementMakerCaller elementMaker cdef python.PyObject* pytype cdef _Element element if self._element_factory is None: element = cetree.makeElement( self._tag, None, objectify_parser, None, None, attrib, self._nsmap) else: element = self._element_factory(self._tag, attrib, self._nsmap) for child in children: if child is None: if len(children) == 1: cetree.setAttributeValue( element, XML_SCHEMA_INSTANCE_NIL_ATTR, "true") elif python._isString(child): _add_text(element, child) elif isinstance(child, _Element): cetree.appendChild(element, child) elif isinstance(child, _ObjectifyElementMakerCaller): elementMaker = <_ObjectifyElementMakerCaller>child if elementMaker._element_factory is None: child = cetree.makeElement( elementMaker._tag, element._doc, objectify_parser, None, None, None, None) else: child = elementMaker._element_factory( (<_ObjectifyElementMakerCaller>child)._tag) cetree.appendChild(element, child) else: pytype = python.PyDict_GetItem( _PYTYPE_DICT, _typename(child)) if pytype is not NULL: (<PyType>pytype)._stringify(element, child) else: child = str(child) _add_text(element, child) return element

Hi,
I finally looked at the new ElementMaker implementation, and it works just fine for me. Attached patch adds tests for it (essentially the very same I already had for the initial implementation), plus some small tests that cover DataElement() "none" vs "NoneType" name compatibility measures. However, this ElementMaker does not add type annotation, as the TypedElementMaker I proposed at some point did. Question: Do we want/need a TypedElementMaker? I'd say yes, otherwise the E-Factory isn't very useful for someone who wants "strong typing". Also, I think it might make sense to have a PT() factory after all, just to add the possibility to hand in attributes:
What do you say? Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Meaning an additional TypedElementMaker, right? I think it is actually nice to have the not-annotating ElementMaker as a choice.
Yes it does but not the same way. DataElement() always uses the literal of the RVAL to infer the type, if not explicitly given, i.e. it does not make use of the python type of an RVAL. PT() otoh does work just like the new __setattr__/_setElementValue. Maybe modify DataElement() instead of introducing PT(), then? Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

But it does not add the type annotation attribute to an element, so:
although "3" is not an integer. Much the same what we discussed regarding DataElement() / PT() behaviour.
I'm about to change DataElement to use the new _setElementValue semantics. Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi,
Patch attached: - makes DataElement() behaviour consistent with new _setElementValue workings, using _typename(<value>) as pytype name, if not explicitly given - makes objectify ElementMaker add pytype annotation - fixes _setElementValue() to safely delAttributeFromNsName() for unregistered types - adds some tests for that stuff It does work except for these doctests for me: TESTED VERSION: 2.0.alpha1-46212 Python: (2, 4, 4, 'final', 0) lxml.etree: (2, 0, -199, 46212) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 758/758 (100.0%): Doctest: xpathxslt.txt ====================================================================== FAIL: Doctest: element_classes.txt ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod//lib/python2.4/unittest.py", line 260, in run testMethod() File "/apps/prod//lib/python2.4/doctest.py", line 2157, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for element_classes.txt File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/element_classes.txt", line 0 ---------------------------------------------------------------------- File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/element_classes.txt", line 347, in element_classes.txt Failed example: print honk_element[0].honking Expected: Traceback (most recent call last): ... AttributeError: 'lxml.etree._Element' object has no attribute 'honking' Got: Traceback (most recent call last): File "/apps/prod//lib/python2.4/doctest.py", line 1248, in __run compileflags, 1) in test.globs File "<doctest element_classes.txt[66]>", line 1, in ? print honk_element[0].honking AttributeError: 'etree._Element' object has no attribute 'honking' ====================================================================== FAIL: Doctest: objectify.txt ---------------------------------------------------------------------- Traceback (most recent call last): File "/apps/prod//lib/python2.4/unittest.py", line 260, in run testMethod() File "/apps/prod//lib/python2.4/doctest.py", line 2157, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for objectify.txt File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/objectify.txt", line 0 ---------------------------------------------------------------------- File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/objectify.txt", line 283, in objectify.txt Failed example: print etree.tostring(root, pretty_print=True) Expected: <root> <a>5</a> <b>6.1</b> <c>true</c> <d tell="me">how</d> </root> Got: <root xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="TREE"> <a xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="int">5</a> <b ns0:pytype="float">6.1</b> <c ns0:pytype="bool">true</c> <d tell="me" ns0:pytype="str">how</d> </root> ---------------------------------------------------------------------- File "/data/pydev/hjoukl/LXML/lxml/src/lxml/tests/../../../doc/objectify.txt", line 302, in objectify.txt Failed example: print etree.tostring(root, pretty_print=True) Expected: <root> <title>The title</title> <type>5</type> </root> Got: <root xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="TREE"> <title xmlns:ns0="http://codespeak.net/lxml/objectify/pytype" ns0:pytype="str">The title</title> <type ns0:pytype="int">5</type> </root> ---------------------------------------------------------------------- Ran 758 tests in 2.762s FAILED (failures=2) make: *** [test_inplace] Error 1 The first one seems to fail because of some change in element naming. The others result from the now-added type information of the ElementMaker but I left them in because I'm not sure why the nsmap-"aggregation" does not work here (ElementMaker uses _DEFAULT_NSMAP) as expected. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hi Holger, the trunk now builds with Cython instead of Pyrex, so please install it to get rid of the one failing doctest. (the reason the test fails is that Cython knows about the package you specify in distutils, Pyrex ignores it). http://www.cython.org/ lxml requires Cython 0.9.6.5. jholg@gmx.de wrote:
Patch attached: - makes DataElement() behaviour consistent with new _setElementValue workings, using _typename(<value>) as pytype name, if not explicitly given
ok.
- makes objectify ElementMaker add pytype annotation
ok. I modified the patch, mainly for performance reasons.
- fixes _setElementValue() to safely delAttributeFromNsName() for unregistered types
Ah, thanks for testing :)
- adds some tests for that stuff
Great, thanks.
:) _DEFAULT_NSMAP is only set *after* the instantiation of the E factory, so its None value is copied into the instance. Stefan

Hi,
Lazy me, not like you hadn't announced that quite a while ago. However, unfortunately: Just downloaded cython and tried to build lxml: 0 lb54320@adevp02 .../lxml $ /apps/pydev/bin/python2.4 setup.py build Traceback (most recent call last): File "setup.py", line 28, in ? import setupinfo File "/data/pydev/hjoukl/LXML/lxml/setupinfo.py", line 5, in ? from Cython.Distutils import build_ext as build_pyx [...] File "/apps/pydev/lib/python2.4/site-packages/Cython/Compiler/TypeSlots.py", line 88 full_args = "O" + self.fixed_arg_format if self.has_dummy_arg else self.fixed_arg_format ^ SyntaxError: invalid syntax Seems like cython relies on Python2.5 syntax, which renders it unusable for me. Any chance to remove the hard 2.5-syntax-dependency? At a quick glance this seems to be the only place where conditional expressions turn up. Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, back from the ever-too-short holidays...
I'm having a bit trouble to apply the patch as some seems to already be in the trunk now,so maybe I'm mistaken, but: How does this remove the need for PT(), which uses the python type name of its argument as pytype? Wouldn't folding this into DataElement() change DataElement behaviour significantly, which currently just operates on the string literal type-lookup?
I saw you already posted an implementation, so I'll have a look at that. Cheers, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi Holger, jholg@gmx.de wrote:
back from the ever-too-short holidays...
welcome back to work. :)
Sorry, it already is in the trunk.
Ok, I was mistaken. I just applied the bit that special cased ObjectifyDataElements that were passed into DataElement() ... and even that might need some rework. In the case where you pass an ODE *and* a _pytype, you'd have to convert the value to a string and process it with the normal machinery. You're right, PT() solves a different purpose. I think it makes sense to add it. (I know, I keep changing my mind here, but it really looks like a helpful little factory).
Thanks. Stefan

welcome back to work. :) Thanks!
Maybe we don't need it after all if general behaviour changes to auto-add python type name of RVALs. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
participants (2)
-
jholg@gmx.de
-
Stefan Behnel