[Tutor] Library for .ppt to .txt conversion
Marc Tompkins
marc.tompkins at gmail.com
Fri May 30 18:57:03 CEST 2014
On Fri, May 30, 2014 at 2:41 AM, Aaron Misquith <aaronmisquith at gmail.com>
wrote:
> Like pypdf is used to convert pdf to text; is there any library that is
> used in converting .ppt files to .txt? Even some sample programs will be
> helpful.
>
I suspect you'd need to use PowerPoint itself to do that cleanly; you can
definitely drive PowerPoint from Python if you so desire, though:
http://www.s-anand.net/blog/automating-powerpoint-with-python/
If anybody's written a package to brute-force the text out of a .ppt file
without using PowerPoint, though, I'm unaware of it. That way lies
madness, I suspect. (The new MS Office formats - .docx, .xlsx, .pptx - are
XML files inside of a renamed ZIP container; it should be fairly easy to
get the text out of a .pptx file using any one of Python's XML libraries.
But the older format is proprietary and extremely scary.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140530/689671f7/attachment.html>
More information about the Tutor
mailing list