> I have a large amount of RTF files where the only thing in them is an
> image.  I would like to extract them an save them as a png.
> Eventually, I would like to also grab some text that is on the image.
> I think PIL has something for this.
> Does anyone have any suggestion on how to start this?

I'm no kind of expert, but I do have a pointer or two...  RTF files are text
with lots and lots of funky-looking formatting, but generally not "binary"
in the sense of requiring special handling (although, now that I just read
about how pictures are stored in them, it seems there might be some
exceptions...)  There's a Python library for dealing with RTF files (
http://www.nava.de/2005/04/06/pyrtf/) but I haven't tried it; if you're
comfortable opening text files and handling their contents, it might be
simpler to roll your own for this task.

You'll want to look at the Microsoft RTF specification, the latest version
of which (1.6) is available here:

In particular, you'll be interested in the section on Pictures, which I'll
excerpt here:

An RTF file can include pictures created with other applications. These
pictures can be in hexadecimal (the default) or binary format. Pictures are
destinations, and begin with the \*pict* control word. The *\pict* keyword
is preceded by* \*\shppict* destination control keyword as described in the
following example. A picture destination has the following syntax:
 <pict> '{' *\pict* (<brdr>? & <shading>? & <picttype> & <pictsize> &
<metafileinfo>?) <data> '}'  <picttype> |* \emfblip* |* \pngblip*
|*\jpegblip | \macpict
* | *\pmmetafile* | *\wmetafile* | *\dibitmap* <bitmapinfo> | *\wbitmap *
<bitmapinfo>  <bitmapinfo> *\wbmbitspixel *& *\wbmplanes* & *\wbmwidthbytes*
<pictsize> (\*picw* & *\pich*) \*picwgoal*? & \*pichgoal*? *\picscalex*? & *
\picscaley*? & *\picscaled*? & *\piccropt*? & *\piccropb*? & *\piccropr*? &
*\piccropl*?  <metafileinfo> *\picbmp *& *\picbpp*  <data> (\*bin* #BDATA) |

Basically, it looks like you can search for "{\pict", then search for the
closing "}".  Everything in between will be your picture, plus metadata that
tells you how to decode it.

Now that you've caught your rabbit... I'm out of advice; I've never used PIL
(though I used to listen to them all the time.)


