Forgot to Reply All.<br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Marc Tompkins</b> <span dir="ltr"><<a href="mailto:marc.tompkins@gmail.com">marc.tompkins@gmail.com</a>></span><br>
Date: Sat, Feb 14, 2009 at 11:35 AM<br>Subject: Re: [Tutor] Extract image from RTF file<br>To: Bryan Fodness <<a href="mailto:bryan.fodness@gmail.com">bryan.fodness@gmail.com</a>><br><br><br><div class="gmail_quote">
<div class="Ih2E3d">On Sat, Feb 14, 2009 at 8:40 AM, Bryan Fodness <span dir="ltr"><<a href="mailto:bryan.fodness@gmail.com" target="_blank">bryan.fodness@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I have a large amount of RTF files where the only thing in them is an<br>
image. I would like to extract them an save them as a png.<br>
Eventually, I would like to also grab some text that is on the image.<br>
I think PIL has something for this.<br>
<br>
Does anyone have any suggestion on how to start this?<br>
</blockquote></div><div><br>I'm no kind of expert, but I do have a pointer or two... RTF files are text with lots and lots of funky-looking formatting, but generally not "binary" in the sense of requiring special handling (although, now that I just read about how pictures are stored in them, it seems there might be some exceptions...) There's a Python library for dealing with RTF files (<a href="http://www.nava.de/2005/04/06/pyrtf/" target="_blank">http://www.nava.de/2005/04/06/pyrtf/</a>) but I haven't tried it; if you're comfortable opening text files and handling their contents, it might be simpler to roll your own for this task.<br>
<br>You'll want to look at the Microsoft RTF specification, the latest version of which (1.6) is available here: <br> <a href="http://msdn.microsoft.com/en-us/library/aa140277%28office.10%29.aspx" target="_blank">http://msdn.microsoft.com/en-us/library/aa140277(office.10).aspx</a><br>
<br>In particular, you'll be interested in the section on Pictures, which I'll excerpt here:<br><h2>Pictures</h2> <p>An RTF file can include pictures
created with other applications. These pictures can be in hexadecimal
(the default) or binary format. Pictures are destinations, and begin
with the \<b>pict</b> control word. The <b><code>\</code>pict</b> keyword is preceded by<b> \*\shppict</b> destination control keyword as described in the following example. A picture destination has the following syntax:</p>
<div><table> <tbody><tr valign="top"> <td width="22%"><pict></td> <td width="78%">'{' <b><code>\</code>pict</b>
(<brdr>? & <shading>? & <picttype> &
<pictsize> & <metafileinfo>?) <data> '}'</td> </tr> <tr valign="top"> <td width="22%"><picttype></td> <td width="78%">|<b> \emfblip</b> |<b> \pngblip</b> |<b> \jpegblip | \macpict</b> | <b><i><code>\</code>pmmetafile</i></b> | <b><i><code>\</code>wmetafile</i></b> | <b><i><code>\</code>dibitmap</i></b> <bitmapinfo> | <b><i><code>\</code>wbitmap </i></b><bitmapinfo></td>
</tr> <tr valign="top"> <td width="22%"><bitmapinfo></td> <td width="78%"><b><i><code>\</code>wbmbitspixel</i> </b>& <b><i><code>\</code>wbmplanes</i></b> & <b><i><code>\</code>wbmwidthbytes</i></b></td> </tr>
<tr valign="top"> <td width="22%"><pictsize></td> <td width="78%">(\<b><i>picw</i></b> & <b><i><code>\</code>pich</i></b>) \<i>picwgoal</i>? & \<i>pichgoal</i>? <b><i><code>\</code>picscalex</i></b>? & <b><i><code>\</code>picscaley</i></b>? & <b><code>\</code>picscaled</b>? & <b><i><code>\</code>piccropt</i></b>? & <b><i><code>\</code>piccropb</i></b>? & <b><i><code>\</code>piccropr</i></b>? & <b><i><code>\</code>piccropl</i></b>?</td>
</tr> <tr valign="top"> <td width="22%"><metafileinfo></td> <td width="78%"><b><code>\</code>picbmp </b>& <b><code>\</code><i>picbpp</i></b></td> </tr> <tr valign="top"> <td width="22%"><data></td> <td width="78%">
(\<b><i>bin</i></b> #BDATA) | #SDATA</td> </tr> </tbody></table></div> <p><br></p><p>Basically, it looks like you can search for "{\pict", then search for the closing "}". Everything in between will be your picture, plus metadata that tells you how to decode it.<br>
</p><p>Now that you've caught your rabbit... I'm out of advice; I've never used PIL (though I used to listen to them all the time.)<br></p></div></div><font color="#888888"><br>-- <br><a href="http://www.fsrtechnologies.com" target="_blank">www.fsrtechnologies.com</a><br>
</font></div><br><br clear="all"><br>-- <br><a href="http://www.fsrtechnologies.com">www.fsrtechnologies.com</a><br>