about Python doc reader

norseman norseman at hughes.net
Wed May 13 17:00:58 EDT 2009


Tim Golden wrote:
> Shailja Gulati wrote:
>> Hi ,
>>
>> I am currently working on "Information retrieval from semi structured 
>> Documents" in which there is a need to read data from Resumes.
>>
>> Could anyone tell me is there any python API to read Word doc?
> 
> If you haven't already, get hold of the pywin32 extensions:
> 
>  http://pywin32.sf.net
> 
> <code>
> import win32com.client
> 
> doc = win32com.client.GetObject ("c:/temp/temp.doc")
> text = doc.Range ().Text
> 
> </code>
> 
> Note that this will give you a unicode object with \r line-delimiters.
> You could read para by para if that were more useful:
> 
> <code>
> import win32com.client
> 
> doc = win32com.client.GetObject ("c:/temp/temp.doc")
> lines = [p.Range () for p in doc.Paragraphs]
> 
> </code>
> 
> TJG
=======================
I saw this right after responding to Kushal's 5:37AM today posting.

Thank you for the tip.  I'll try these first chance I get.
Word, swriter, whatever - I'm not partial when it comes to automating.


Today is: 20090513

Steve



More information about the Python-list mailing list