[Doc-SIG] ASCII to Word?

David Goodger goodger@users.sourceforge.net
Tue, 04 Jun 2002 00:04:49 -0400


Aahz wrote:
> All right, I'm ready to get started on writing my book.

Great!  What's the title and/or subject, if you can say?  Are there
any special features you'll need?

> My publisher wants the document in Word format.

My condolences.

> I'm starting in some form of ASCII, quite likely reST.

Glad to hear it.  I (and hopefully others involved with Docutils) will
be happy to help.

But are there any other requirements other than "Word"?  When I wrote
a chapter on Python for Wrox [*]_, they sent me an elaborate (and
broken) stylesheet to use.  Anything like that?  Could be significant.

.. [*] *Professional Linux Programming*, chapter 15.  My mug is at the
   far right, second from the top.

Once you produce a Word document, I assume your publisher will add
comments to the Word files and send them back to you for revision.
(Wrox loved using Word's comment feature.)  Will your publisher
insist on preserving the comments and revision history?  If they do
insist on "read-write" Word files, you're screwed, and are stuck with
Word.  If they'll accept fresh, uncommented drafts each time, then
using a toolchain to generate Word files as a final, read-only display
format should be feasible.  On second thought, I think you can merge
versions of documents in Word, so you might be safe either way.

> What do people recommend as the best path for getting from here to
> there?

My first thought on an interchange format would be RTF.  I believe RTF
can identify Word stylesheet styles by name, but I don't know it well.
However:

[Engelbert Gruber]
>> for going to word html might be not a bad option as word would read
>> it as i heard. what means word actually, a word readable format is
>> definately something different ? would rtf be word enough ?

[Aahz]
> RTF is an old format with few features; in particular, it doesn't
> support index tags.

If anyone cares to dig in deep, the RTF 1.6 spec is online at:
http://msdn.microsoft.com/library/?url=/library/en-us/dnrtfspec/html/rtfspec
.asp
The 1.7 spec can be downloaded from:
http://download.microsoft.com/download/Word2002/Install/1.7/W98NT42KMeXP/EN-
US/W2KRTFSF.exe

> I suspect HTML would have the same problem; I'll assume it does
> unless someone can tell me otherwise from direct experience.

With the "class" attribute on "span" and "group" elements, HTML can
represent literally anything, if indirectly and inelegantly.  The
problem is, can the software *reading* the HTML (Word, in this case)
*do* anything with this information?  Probably not.  You'll need more
direct access to native features.

> Is anyone else using reST for long technical documents?

Probably the longest existing documents in reStructuredText are those
*describing* reStructuredText and Docutils: the specs and user docs.
They're probably book-length in total.  But I'm happy you're working
to topple the record.

> In the absence of better advice, I'm going to convert to
> OpenOffice's XML format, then use OpenOffice to convert to Word.

I found docs on the StarOffice XML spec (which is the draft for
OpenOffice) at http://xml.openoffice.org/.  It looks much more
promising than RTF.  If it's a reasonable markup -- and it appears to
be so -- then I don't forsee any major problems with an "OpenOffice
Writer" for Docutils.  Any takers?

> What I'd really like is a DocBook-to-Word converter, but I haven't
> seen anything like that.  If I don't get anything here, my next step
> is to ask on comp.text.xml.

I wouldn't be surprised if DocBook-to-Word converters *do* exist.  The
question then becomes, how to produce the DocBook?  Would you still
use reStructuredText?  A "DocBook Writer" for Docutils shouldn't be
too challenging.

> One way or another, I'm assuming that Python will be used for at
> least the first stage of translation.

Just so long as you understand the current limitations, and you're
realistic in your expectations.  Docutils has a lot of growing and
evolving yet to come.  A lot of the support you need isn't in place
yet.  How much time will *you* be able to put in to the tools?  If
you're under serious deadline pressure, you may want to reconsider.
Docutils is progressing, but only in people's spare time.

Did that scare you off?  If not, I'm sure that together we can come up
with a decent system.  Input from real-life usage like this is
invaluable.  Docutils will undoubtedly benefit.

Please join the Docutils implementation mailing list:
http://lists.sourceforge.net/lists/listinfo/docutils-develop/.
Mostly, we discuss the nitty-gritties there, and high-level stuff
here.

-- 
David Goodger  <goodger@users.sourceforge.net>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/