[Tutor] PDF to text conversion
Robert Berman
bermanrl at cfl.rr.com
Wed Apr 22 00:37:39 CEST 2009
First, thanks to everyone who contributed to this thread. I have a
number of possible solutions and a number of paths to pursue to
determine which avenue I should take to resolve this remaining issue. I
did try the itools library and while everything installed nicely, most
of the tests failed so I am not particularly overjoyed with the results.
Thank you Dinesh for the vote of sympathy. I do appreciate it.
I did use Adobe Reader to convert the history PDF file into a text file
and it did seem to do it faithfully. So now I will work out a parsing
function to extract my data and send it to a SQLLITE database.
I am thrilled both with the number of suggestions I have received from
this group and the quality of the suggestions.
Thanks again,
Robert Berman
Norman Khine wrote:
> the itools library from hforge.org has a PDF2TEXT implementation itools.pdf
>
> http://www.hforge.org/itools
>
> norman
>
> On Tue, Apr 21, 2009 at 8:44 PM, Dayo Adewunmi <contactdayo at gmail.com> wrote:
>
>> Emile van Sebille wrote:
>>
>>> Robert Berman wrote:
>>> <snip>
>>>
>>>
>>>> Have any of you worked with such a library, or do you know of one or two
>>>> I can download and work with? Hopefully, they have reasonable documentation.
>>>>
>>>> My development environment is:
>>>>
>>>> Python
>>>> Linux
>>>> Ubuntu version 8.10
>>>>
>>>>
>>> I've used
>>>
>>> [root at fcfw2 /]# /usr/bin/pdftotext -v
>>> pdftotext version 2.01
>>> Copyright 1996-2002 Glyph & Cog, LLC
>>> [root at fcfw2 /]# cat /etc/issue
>>> Red Hat Linux release 9 (Shrike)
>>>
>>>
>>> HTH,
>>>
>>> Emile
>>>
>>> _______________________________________________
>>> Tutor maillist - Tutor at python.org
>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>>>
>> Hi Robert,
>> pdftotext is part of poppler-utils, an Ubuntu package which can be installed
>> like so:
>>
>> sudo aptitude install poppler-utils
>>
>> But I to would be interested in finding a python library/module for this.
>>
>> Regards,
>>
>> Dayo
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
More information about the Tutor
mailing list