[XML-SIG] Entity managment question --

Dennis Allison allison@sumeru.stanford.EDU
Sun, 5 May 2002 10:00:05 -0700 (PDT)


Thanks for the response.  The problem is that the unicode representation
is not a transparent one.  My post was the result of a preprocessor
program failing.  The preprocessor takes a collection of XML documents,
traverses them, modifies and edits them, writes out revised XML, and
then constructs table presets for a database that works in conjunction
with the XML--the tools are mostly in Python and use the latest PyXML
release.  The XML representation and stuff derived provide unicode
strings. Unfortunatly, many of the other tools fail and die one something
as simple as writestream().  I solved my immediate problem by writing a 
conversion from unicode string to ASCII with embedded HTML-entities
and using it where needed.  

On 5 May 2002, Martin v. Loewis wrote:

> Dennis Allison <allison@sumeru.stanford.EDU> writes:
> 
> > The problem is recapturing the HTML-ish entities that have been converted
> > to unicode.  Does such a beast exist?  And where can it be found?
> 
> Why is it desirable to restore those entities? I can offer a number of
> alternatives:
> 
> - generate UTF-8 on output, then you will never ever need to create
>   references
> - generate character reference instead of entity references
> 
> Either approach will create well-formed HTML. If you think you must
> have entity references, you can use the htmlentitydefs module to
> generate *all* entity references.
> 
> Regards,
> Martin
>