[Tutor] PDF to text conversion

Robert Berman bermanrl at cfl.rr.com
Wed Apr 22 14:00:43 CEST 2009


Dinesh,

I have pdftotext version 3.0.0.  I have decided to use this to go from 
PDF to text. It is not the ideal solution, but is is a certainly doable 
solution.

Thank you,

Robert

Dinesh B Vadhia wrote:
> The best converter so far is pdftotext from 
> http://www.glyphandcog.com/ who maintain an open source project at 
> http://www.foolabs.com/xpdf/.
>  
> It's not a Python library but you can call pdftotext from with Python 
> using os.system().  I used the pdftotext -layout option and that gave 
> the best result.  hth.
>  
> dinesh
>  
> ------------------------------------------------------------------------
> Message: 4
> Date: Tue, 21 Apr 2009 18:37:39 -0400
> From: Robert Berman <bermanrl at cfl.rr.com <mailto:bermanrl at cfl.rr.com>>
> Subject: Re: [Tutor] PDF to text conversion
> To: tutor at python.org <mailto:tutor at python.org>
> Message-ID: <49EE4AB3.4040103 at cfl.rr.com 
> <mailto:49EE4AB3.4040103 at cfl.rr.com>>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> First, thanks to everyone who contributed to this thread. I have a
> number of possible solutions and a number of paths to pursue to
> determine which avenue I should take to resolve this remaining issue. I
> did try the itools library and while everything installed nicely, most
> of the tests failed so I am not particularly overjoyed with the results.
>
> Thank you Dinesh for the vote of sympathy. I do appreciate it.
>
> I did use Adobe Reader to convert the history PDF file into a text file
> and it did seem to do it faithfully. So now I will work out a parsing
> function to extract my data and send it to a SQLLITE database.
>
> I am thrilled both with the number of suggestions I have received from
> this group and the quality of the suggestions.
>
> Thanks again,
>
> Robert Berman
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>   


More information about the Tutor mailing list