[XML-SIG] Re: checking a string for well-formedness

"Martin v. Löwis" martin@v.loewis.de
Fri, 09 May 2003 14:57:40 +0200


Paul Tremblay wrote:

 > I must be dense when it comes to unicode. So Python converts unicode
 > to a 7-bit (ASCII) string?

In some cases, yes. If you use an API function that requires a byte 
string, such as file.write, it converts to byte strings using the system 
default encoding, which is ASCII.

The resulting strings are still 8-bit strings (i.e. byte strings), since 
your computer cannot represent 7-bit quantities. However, for each byte, 
the MSB will be 0.

> The first time the string is tested, it comes out as valid. But every
> single instance afterwards comes out all ill-formed XML.

The parser maintains internal state, to remember where inside the 
document it is. When parse completes, the state says "at the end of the 
document". It is an error to provide more markup at this point.

You either need to throw away the parser object and create a new one, or 
reset the parser object that you already have.

Regards,
Martin