[XML-SIG] pulldom CHARACTERS problem
Uche Ogbuji
Uche.Ogbuji at fourthought.com
Tue Mar 22 21:26:45 CET 2005
On Fri, 2005-03-11 at 18:16 +0900, Grant Morganryuuguu wrote:
> I solved the problem and am responding to myself for the benifit of future googlers.
> The sax parsers my split nodes of type CHARACTERS into multiple nodes so they have to be joined back together. Since pulldom depends on a sax parser it also may do this. My method to find and join together the next CHARACTERS node is below. It assumes that
> self.event,self.node = iter.next()
> was executed previously.
>
> def getCharacterNode(self,iter):
> while self.event != 'CHARACTERS':
> self.event,self.node = iter.next()
> chars=[]
> chars.append(self.node.nodeValue)
> self.event,self.node = iter.next()
> while self.event == 'CHARACTERS':
> chars.append(self.node.nodeValue)
> self.event,self.node = iter.next()
> return ''.join(chars)
Or see:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881
and the updated version that is part of Amara:
http://www.xml.com/pub/a/2005/01/19/amara.html
http://cvs.4suite.org/viewcvs/Amara/lib/saxtools.py?rev=1.9&view=markup
(class normalize_text_filter, which you should be able to copy to your
code if you don't want to install Amara).
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html
More information about the XML-SIG
mailing list