[XML-SIG] Advice needed: RTF->XML conversions
Jeremy J. Sydik
jsydik@virtualparadigm.com
Thu, 17 May 2001 18:14:30 -0500
---------------------------------------------------------------------------
Martin is right. The Office/Word 'XML' can be a difficult thing to work
with. It's been a while since i've thought about it, but you will probably
need to account for the following:
* Not all attributes are quoted
* Singleton tags aren't closed (This can be dealt with fairly easily,
however. It's simply the 'standard' singleton html tags that
occur this way (br, img, etc).
* There are a few microsoft namespaces to deal with, as well as
special tags. The documentation for these is found in:
http://msdn.microsoft.com/library/officedev/ofxml2k/ofhtml9.exe
The primary ones you'll probably encounter are o: and w:
* Also described in this document are
<!--[if condition]>...<[endif]-->
and
<![if condition]>...<![endif]>
pairs. These break most SGML
and XML implementations. (It would be good to think of a regex
solution, since you'll probably need one to properly enclose
the attributes anyway).
Once those issues are addressed, you SHOULD have valid XML. If you don't,
chances are you haven't hit everything in this list :)
Good Luck,
Jeremy
-----Original Message-----
From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On
Behalf Of Martin v. Loewis
Sent: Thursday, May 17, 2001 1:15 PM
To: Mike.Olson@fourthought.com
Cc: tony.mcdonald@ncl.ac.uk; Alexandre.Fayolle@logilab.fr;
xml-sig@python.org
Subject: Re: [XML-SIG] Advice needed: RTF->XML conversions
> Can you send me a sample of the word XML output, and the format your
> looking for. You can probably do it with a stylesheet as long as what
> word spits out really is XML.
It isn't. Most notably, attribute values are not enclosed in quotes.
I found that sgmlop can parse what word produces, though.
Regards,
Martin
_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig