HowTo Fix illegal chars in xml-documents??

Andreas Jung andreas at digicool.com
Mon Feb 26 07:30:16 EST 2001


I remember there is a tool "xp" in the SP package (a tool for SGML
conversion....).
This utility allows to make SGML documents XML compliant by fixing several
problems. I have used in a project with about 10.000.000 SGML documents
where my own "SGMl2XML" converter had problems due to invalid tagging
or wrong entities.

Andreas
----- Original Message -----
From: "Thomas Weholt" <thomas at cintra.no>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Monday, February 26, 2001 3:38 AM
Subject: HowTo Fix illegal chars in xml-documents??


> Hi,
>
> I'm trying to "convert" alot of legacy information into collection of
> xml-documents. In the old data there are alot of characters that are
illegal
> if used in the character part of the XML, like &, ;, < and >.  I've tried
to
> make a quick-fix for this, to swap these characters with the proper
> HTML-alternative, like ; for ; etc. but haven't found a really good
way
> of doing this. I need to be able to swap the old chars back too, but
that's
> not very important now. Does anybody have a clue or a piece of code to get
> me going? As of now, a great deal of the converted documents are not
proper
> xml-documents and will not get parsed at all.
>
> Thanks in advance,
> Thomas
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list





More information about the Python-list mailing list