[XML-SIG] pulldom CHARACTERS problem

Uche Ogbuji Uche.Ogbuji at fourthought.com
Tue Mar 22 21:26:45 CET 2005

On Fri, 2005-03-11 at 18:16 +0900, Grant Morganryuuguu wrote:
> I solved the problem and am responding to myself for the benifit of future googlers.
> The sax parsers my split nodes of type CHARACTERS into multiple nodes so they have to be joined back together. Since pulldom depends on a sax parser it also may do this.  My method to find and join together the next CHARACTERS node is below. It assumes that
> self.event,self.node  = iter.next()
> was executed previously.
>      def getCharacterNode(self,iter):
>          while self.event != 'CHARACTERS':
>              self.event,self.node  = iter.next()
>          chars=[]
>          chars.append(self.node.nodeValue)
>          self.event,self.node  = iter.next()
>          while self.event == 'CHARACTERS':
>              chars.append(self.node.nodeValue)
>              self.event,self.node  = iter.next()
>          return ''.join(chars)

Or see:


and the updated version that is part of Amara:

(class normalize_text_filter, which you should be able to copy to your
code if you don't want to install Amara).

Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html

More information about the XML-SIG mailing list