Problem with "&" charater in xml.
Kirt
moqtar at gmail.com
Thu Jul 13 03:25:13 EDT 2006
thanx stefan ur approach worked.
Stefan Behnel wrote:
> Kirt wrote:
> > How do i append characters to a string?
>
> I think the normal approach is to store an empty string (or list) in an
> attribute in startElement(), append to it in characters() and use the result
> in endElement().
>
> def startElement(self, ...):
> self.chars = ''
> def characters(self, s):
> self.chars += s
> def endElement(self, ...):
> value = self.chars
>
> Or use a list and do this:
>
> def endElement(self, ...):
> value = ''.join(self.char_list)
>
> Maybe you should consider switching to iterparse() of ElementTree or lxml.
> Should be a bit easier to use than SAX ...
>
> http://effbot.org/zone/element-iterparse.htm
> http://codespeak.net/svn/lxml/trunk/doc/api.txt
>
> Stefan
>
>
> > Stefan Behnel wrote:
> >> Kirt wrote:
> >>> i have walked a directory and have written the foll xml document.
> >>> one of the folder had "&" character so i replaced it by "&"
> >>> #------------------test1.xml
> >>> <Directory>
> >>> <dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
> >>> w&y </dirname>
> >>> <file>
> >>> <name>def.txt</name>
> >>> <time>200607130417</time>
> >>> </file>
> >>> </Directory>
> >>> <Directory>
> >>> <dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
> >>> wx</dirname>
> >>> <file>
> >>> <name>abc.txt</name>
> >>> <time>200607130415</time>
> >>> </file>
> >>> </Directory
> >>>
> >>> now in my python code i want to parse this doc and print the directory
> >>> name.
> >>> ###----------handler------------filename---handler.py
> >>> from xml.sax.handler import ContentHandler
> >>> class oldHandler(ContentHandler):
> >>> def __init__(self):
> >>> self.dn = 0
> >>> def startElement(self, name, attrs):
> >>> if name=='dirname':
> >>> self.dn=1
> >>>
> >>> def characters(self,str):
> >>> if self.dn:
> >>> print str
> >>
> >> The problem is here. "print" adds a newline. Don't use print, just append the
> >> characters (to a string or list) until the endElement callback is called.
> >>
> >>
> >>> def endElement(self, name):
> >>> if name == 'dirname':
> >>> self.dn=0
> >>>
> >>>
> >>> #---------------------------------------------------------------------
> >>> #main code--- fname----art.py
> >>> import sys
> >>> from xml.sax import make_parser
> >>> from handlers import oldHandler
> >>>
> >>> ch = oldHandler()
> >>> saxparser = make_parser()
> >>>
> >>> saxparser.setContentHandler(ch)
> >>> saxparser.parse(sys.argv[1])
> >>> #-----------------------------------------------------------------------------
> >>> i run the code as: $python art.py test1.xml
> >>>
> >>> i am getting output as:
> >>>
> >>> C:\Documents and Settings\Administrator\Desktop\1\bye w
> >>> &
> >>> y
> >>> C:\Documents and Settings\Administrator\Desktop\1\hii wx
> >>>
> >>> where as i need an output which should look like this.
> >>> C:\Documents and Settings\Administrator\Desktop\1\bye w&y
> >>>
> >>> C:\Documents and Settings\Administrator\Desktop\1\hii wx
> >>>
> >>> Can someone tell me the solution for this.
> >>>
> >
More information about the Python-list
mailing list