[Tutor] PDF to text conversion
Robert Berman
bermanrl at cfl.rr.com
Wed Apr 22 14:00:43 CEST 2009
Dinesh,
I have pdftotext version 3.0.0. I have decided to use this to go from
PDF to text. It is not the ideal solution, but is is a certainly doable
solution.
Thank you,
Robert
Dinesh B Vadhia wrote:
> The best converter so far is pdftotext from
> http://www.glyphandcog.com/ who maintain an open source project at
> http://www.foolabs.com/xpdf/.
>
> It's not a Python library but you can call pdftotext from with Python
> using os.system(). I used the pdftotext -layout option and that gave
> the best result. hth.
>
> dinesh
>
> ------------------------------------------------------------------------
> Message: 4
> Date: Tue, 21 Apr 2009 18:37:39 -0400
> From: Robert Berman <bermanrl at cfl.rr.com <mailto:bermanrl at cfl.rr.com>>
> Subject: Re: [Tutor] PDF to text conversion
> To: tutor at python.org <mailto:tutor at python.org>
> Message-ID: <49EE4AB3.4040103 at cfl.rr.com
> <mailto:49EE4AB3.4040103 at cfl.rr.com>>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> First, thanks to everyone who contributed to this thread. I have a
> number of possible solutions and a number of paths to pursue to
> determine which avenue I should take to resolve this remaining issue. I
> did try the itools library and while everything installed nicely, most
> of the tests failed so I am not particularly overjoyed with the results.
>
> Thank you Dinesh for the vote of sympathy. I do appreciate it.
>
> I did use Adobe Reader to convert the history PDF file into a text file
> and it did seem to do it faithfully. So now I will work out a parsing
> function to extract my data and send it to a SQLLITE database.
>
> I am thrilled both with the number of suggestions I have received from
> this group and the quality of the suggestions.
>
> Thanks again,
>
> Robert Berman
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list