[Tutor] PDF to text conversion

Martin Walsh mwalsh at mwalsh.org
Tue Apr 21 20:57:56 CEST 2009


Robert Berman wrote:
> Hello Emad,
> 
> I have seriously looked at the documentation associated with pyPDF. This
> seems to have the page as its smallest element of work, and what i need
> is a line by line process to go from .PDF format to Text. I don't think
> pyPDF will meet my needs but thank you for bringing it to my attention.
> 
> Thanks,
> 
> 
> Robert Berman

Have you looked at pdfminer?

http://www.unixuser.org/~euske/python/pdfminer/index.html

Looks promising.

HTH,
Marty


> 
> Emad Nawfal (عماد نوفل) wrote:
>>
>>
>> On Tue, Apr 21, 2009 at 12:54 PM, bob gailer <bgailer at gmail.com
>> <mailto:bgailer at gmail.com>> wrote:
>>
>>     Robert Berman wrote:
>>
>>         Hi,
>>
>>         I must convert a history file in PDF format that goes from May
>>         of 1988 to current date.  Readings are taken twice weekly and
>>         consist of the date taken mm/dd/yy and the results appearing
>>         as a 10 character numeric + special characters sequence. This
>>         is obviously an easy setup for a very small database
>>          application with the date as the key, the result string as
>>         the data.
>>
>>         My problem is converting the PDF file into a text file which I
>>         can then read and process. I do not see any free python
>>         libraries having this capacity. I did see a PDFPILOT program
>>         for Windows but this application is being developed on Linux
>>         and should also run on Windows; so I do not want to
>>         incorporate a Windows only application.
>>
>>         I do not think i am breaking any new frontiers with this
>>         application. Have any of you worked with such a library, or do
>>         you know of one or two I can download and work with?
>>         Hopefully, they have reasonable documentation.
>>
>>
>>     If this is a one-time conversion just use the save as text feature
>>     of adobe reader.
>>
>>
>>
>>         My development environment is:
>>
>>         Python
>>         Linux
>>         Ubuntu version 8.10
>>
>>
>>         Thanks for any help  you might be able to offer.
>>
>>
>>         Robert Berman
>>         _______________________________________________
>>         Tutor maillist  -  Tutor at python.org <mailto:Tutor at python.org>
>>         http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>
>>     --     Bob Gailer
>>     Chapel Hill NC
>>     919-636-4239
>>
>>     _______________________________________________
>>     Tutor maillist  -  Tutor at python.org <mailto:Tutor at python.org>
>>     http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>
>> I tried pyPdf once, just for fun, and it was nice:
>> http://pybrary.net/pyPdf/
>> -- 
>> لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه
>> كالحقيقة.....محمد الغزالي
>> "No victim has ever been more repressed and alienated than the truth"
>>
>> Emad Soliman Nawfal
>> Indiana University, Bloomington
>> --------------------------------------------------------
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list