about Python doc reader
Tim Golden
mail at timgolden.me.uk
Wed May 13 09:01:59 EDT 2009
Shailja Gulati wrote:
> Hi ,
>
> I am currently working on "Information retrieval from semi structured
> Documents" in which there is a need to read data from Resumes.
>
> Could anyone tell me is there any python API to read Word doc?
If you haven't already, get hold of the pywin32 extensions:
http://pywin32.sf.net
<code>
import win32com.client
doc = win32com.client.GetObject ("c:/temp/temp.doc")
text = doc.Range ().Text
</code>
Note that this will give you a unicode object with \r line-delimiters.
You could read para by para if that were more useful:
<code>
import win32com.client
doc = win32com.client.GetObject ("c:/temp/temp.doc")
lines = [p.Range () for p in doc.Paragraphs]
</code>
TJG
More information about the Python-list
mailing list