[Tutor] Hoping to benefit from someone's experience...

Marc Tompkins marc.tompkins at gmail.com
Wed Apr 16 04:03:44 CEST 2008


Does anyone out have experience with:
-  manipulating RTF files?
-  or writing OpenOffice macros in Python?

I need to pre-process approximately 10,000 medical reports so they can be
imported into an EMR.  (They were originally saved as Word .docs; I'd like
to give hearty thanks to the authors of "ooconvert" (
http://sourceforge.net/projects/ooconvert/), which enabled me to do a batch
conversion with much less effort than I was expecting...)

Anyway, the files as they now exist each contain a single section, with a
header and footer on each page.
The EMR's import function wants to see a slug of summary information as the
first thing on the first page, which means that the header needs to be
suppressed; however, I expect that the users will want to reprint these
things in the future, so I don't want to delete the header entirely.  In
short, what I want to do is insert a new section at the top of the file.

My tasks are:
- figure out which codes I need to insert to create a new section with no
header and then re-enable it at the end
- figure out where in the file to do the inserting (I thought I already had
this worked out, but apparently not quite)
THEN
- figure out how to find the correct insertion point programmatically -
either agnostically, by finding a particular pattern of symbols that occur
in the right location, or by actually parsing the RTF hierarchy to figure
out where the meta-document ends and the document begins.  The agnostic
solution would be much easier - and this is a one-off, so I'm not building
for the ages here - but it really looks like homogeneous tag soup to me.  I
have, of course, tried inserting the section myself and then compared the
before-and-after files... but all I've got so far is a headache...  (Not
quite true - I think I'm close - but I'm getting angrier with Microsoft with
every moment I spend looking at this stuff.  Hyper-optimized Perl is a
freakin' marvel of clarity next to this... )
{\footerr \ltrpar \pard\plain \ltrpar\s22\qc
\li0\ri0\nowidctlpar\tqc\tx4153\tqr\tx8306\wrapdefault\faauto\rin0\lin0\itap0
\rtlch\fcs1 \af0\afs20\alang1025 \ltrch\fcs0
\fs24\lang3081\langfe255\cgrid\langnp3081\langfenp255
{\rtlch\fcs1 \af0 \ltrch\fcs0 \f1\fs16\insrsid5703726 \par }
\pard \ltrpar\s22\qc
\li0\ri0\nowidctlpar\tqc\tx4153\tqr\tx8306\wrapdefault\faauto\rin0\lin0\itap0\pararsid5703726

{\rtlch\fcs1 \af0\afs24 \ltrch\fcs0
\fs16\lang3081\langfe1033\loch\af1\hich\af43\langfenp1033\insrsid5703726
GAAAAAAAHHH!

It occurs to me that there might be another way - maybe I can automate
OpenOffice to open each file and insert the blank section and some dummy
text, and then, in Python, find the dummy text and replace it with the
summary slug?   Maybe even do the whole job with a macro?  And never have to
poke through RTF again?

So I was juggling the RTF spec (the RTFM?), a text editor, Word (so I can
make sure the thing still looks right), and the import utility - when it
suddenly struck me that someone out there may have done this before.  (And
yes, I've definitely Googled, but my Google-fu may be weak today.)  If
anyone has insight into RTF Zen, or has some tips on batch macros in oO, I'd
be obliged...

Marc
-- 
www.fsrtechnologies.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20080415/1c779f54/attachment.htm 


More information about the Tutor mailing list