[Tutor] PDF to text conversion

johnf jfabiani at yolo.com
Tue Apr 21 19:49:31 CEST 2009


On Tuesday 21 April 2009 10:36:59 am Robert Berman wrote:
> Bob,
>
> Thank you for the quick reply. I am acquainted with that method, and
> that will certainly work to do some really serious testing; but, the
> data collection is an ongoing process and  the users are requesting that
> every month the latest entries (8) are brought into the system. What is
> rather irksome is that the output from the system cannot be changed from
> PDF to text; so obviously I am going to have to resolve the situation at
> my end.
>
> I am envisioning a simple program that once started reads the data file,
> converts the data into text, and then sends the data to the database.
> The program doesn't care if there are 8 test results or 80,000 test
> results. That is why i am looking for a python module.
>
> Thanks again,
>
> Robert Berman
>
> bob gailer wrote:
> > Robert Berman wrote:
> >> Hi,
> >>
> >> I must convert a history file in PDF format that goes from May of
> >> 1988 to current date.  Readings are taken twice weekly and consist of
> >> the date taken mm/dd/yy and the results appearing as a 10 character
> >> numeric + special characters sequence. This is obviously an easy
> >> setup for a very small database  application with the date as the
> >> key, the result string as the data.
> >>
> >> My problem is converting the PDF file into a text file which I can
> >> then read and process. I do not see any free python libraries having
> >> this capacity. I did see a PDFPILOT program for Windows but this
> >> application is being developed on Linux and should also run on
> >> Windows; so I do not want to incorporate a Windows only application.
> >>
> >> I do not think i am breaking any new frontiers with this application.
> >> Have any of you worked with such a library, or do you know of one or
> >> two I can download and work with? Hopefully, they have reasonable
> >> documentation.
> >
> > If this is a one-time conversion just use the save as text feature of
> > adobe reader.
> >
> >> My development environment is:
> >>
> >> Python
> >> Linux
> >> Ubuntu version 8.10
> >>
> >>
> >> Thanks for any help  you might be able to offer.
> >>
> >>
> >> Robert Berman

On linux pdftotext is available and you might want to check out ghostscript 
which runs on windows and linux.



-- 
John Fabiani


More information about the Tutor mailing list