[Tutor] PDF to text conversion

Robert Berman bermanrl at cfl.rr.com
Wed Apr 22 00:37:39 CEST 2009


First, thanks to everyone who contributed to this thread. I have a 
number of possible solutions and a number of paths to pursue to 
determine which avenue I should take to resolve this remaining issue. I 
did try the itools library and while everything installed nicely, most 
of the tests failed so I am not particularly overjoyed with the results.

Thank you Dinesh for the vote of sympathy. I do appreciate it.

I did use Adobe Reader to convert the history PDF file into a text file 
and it did seem to do it faithfully. So now I will work out a parsing 
function to extract my data and send it to a SQLLITE database.

I am thrilled both with the number of suggestions I have received from 
this group and the quality of the suggestions.

Thanks again,

Robert Berman



 

Norman Khine wrote:
> the itools library from hforge.org has a PDF2TEXT implementation itools.pdf
>
> http://www.hforge.org/itools
>
> norman
>
> On Tue, Apr 21, 2009 at 8:44 PM, Dayo Adewunmi <contactdayo at gmail.com> wrote:
>   
>> Emile van Sebille wrote:
>>     
>>> Robert Berman wrote:
>>> <snip>
>>>
>>>       
>>>> Have any of you worked with such a library, or do you know of one or two
>>>> I can download and work with? Hopefully, they have reasonable documentation.
>>>>
>>>> My development environment is:
>>>>
>>>> Python
>>>> Linux
>>>> Ubuntu version 8.10
>>>>
>>>>         
>>> I've used
>>>
>>> [root at fcfw2 /]# /usr/bin/pdftotext -v
>>> pdftotext version 2.01
>>> Copyright 1996-2002 Glyph & Cog, LLC
>>> [root at fcfw2 /]# cat /etc/issue
>>> Red Hat Linux release 9 (Shrike)
>>>
>>>
>>> HTH,
>>>
>>> Emile
>>>
>>> _______________________________________________
>>> Tutor maillist  -  Tutor at python.org
>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>>>       
>> Hi Robert,
>> pdftotext is part of poppler-utils, an Ubuntu package which can be installed
>> like so:
>>
>> sudo aptitude install poppler-utils
>>
>> But I to would be interested in finding a python library/module for this.
>>
>> Regards,
>>
>> Dayo
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>     
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>   


More information about the Tutor mailing list