SAX: `raw XML'

Martin v. Löwis martin at v.loewis.de
Tue May 6 14:18:13 EDT 2003


jhefferon at smcvt.edu (Jim Hefferon) writes:

> I'm writing a routine to parse XML using SAX.  I'm worried about the kind
> of strings that the SAX parser is giving me.  In my startElement
> the documentation describes the parameter `name' as a "raw XML 1.0 string".  

I'm not quite sure what the author of this text meant when he wrote
"raw XML 1.0 string".

> Does that mean that to process it (in the Right Way) I should use a Python
> unicode string?  

Whatever the documentation means to say: It is a Unicode object,
representing the element name.

> Should I say something like
> 
> def startElement(self,name,attrs):
>     if ((self._state==u"start")
>         and (name==u"FirstName")):
>         self._state="FirstName"

That would be correct, yes. However, you can also compare with "FirstName"
(unless somebody has messed up the system encoding), since comparing
a Unicode string with "FirstName" first converts "start" to u"FirstName".

You could also consider assigning name directly to self._state, or you
could consider invoking name.encode("ascii"), to get a byte string.

Regards,
Martin




More information about the Python-list mailing list