<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content=text/html;charset=iso-8859-1>
<META content="MSHTML 6.00.6000.16825" name=GENERATOR></HEAD>
<BODY id=MailContainerBody
style="PADDING-RIGHT: 10px; PADDING-LEFT: 10px; PADDING-TOP: 15px" leftMargin=0
topMargin=0 CanvasTabStop="true" name="Compose message area">
<DIV><FONT face=Garamond color=#000080>The best converter so far is pdftotext
from </FONT><A href="http://www.glyphandcog.com/"><FONT
title="http://www.glyphandcog.com/ CTRL + Click to follow link"
face=Garamond color=#000080>http://www.glyphandcog.com/</FONT></A><FONT
face=Garamond color=#000080> who maintain an open source project at
</FONT><A title="http://www.foolabs.com/xpdf/ CTRL + Click to follow link"
href="http://www.foolabs.com/xpdf/"><FONT
title="http://www.foolabs.com/xpdf/ CTRL + Click to follow link"
face=Garamond color=#000080>http://www.foolabs.com/xpdf/</FONT></A><FONT
face=Garamond color=#000080>.</FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><FONT face=Garamond
color=#000080></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Garamond color=#000080>It's not a Python library but you can
call pdftotext from with Python using os.system(). I used the pdftotext
-layout option and that gave the best result. </FONT><FONT
face=Garamond><FONT color=#000080><FONT face=Garamond
color=#000080>hth.</FONT></FONT></FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><FONT face=Garamond
color=#000080></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Garamond><FONT color=#000080><FONT face=Garamond
color=#000080>dinesh</FONT></FONT></FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><FONT face=Garamond
color=#000080></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Garamond color=#000080>
<HR>
</FONT></DIV>
<DIV><FONT face=Garamond color=#000080>Message: 4<BR>Date: Tue, 21 Apr 2009
18:37:39 -0400<BR>From: Robert Berman <</FONT><A
title="mailto:bermanrl@cfl.rr.com CTRL + Click to follow link"
href="mailto:bermanrl@cfl.rr.com"><FONT
title="mailto:bermanrl@cfl.rr.com CTRL + Click to follow link" face=Garamond
color=#000080>bermanrl@cfl.rr.com</FONT></A><FONT face=Garamond
color=#000080>><BR>Subject: Re: [Tutor] PDF to text conversion<BR>To:
</FONT><A href="mailto:tutor@python.org"><FONT face=Garamond
color=#000080>tutor@python.org</FONT></A><BR><FONT face=Garamond
color=#000080>Message-ID: <</FONT><A
title="mailto:49EE4AB3.4040103@cfl.rr.com CTRL + Click to follow link"
href="mailto:49EE4AB3.4040103@cfl.rr.com"><FONT face=Garamond
color=#000080>49EE4AB3.4040103@cfl.rr.com</FONT></A><FONT face=Garamond
color=#000080>><BR>Content-Type: text/plain; charset=ISO-8859-1;
format=flowed<BR><BR>First, thanks to everyone who contributed to this thread. I
have a <BR>number of possible solutions and a number of paths to pursue to
<BR>determine which avenue I should take to resolve this remaining issue. I
<BR>did try the itools library and while everything installed nicely, most
<BR>of the tests failed so I am not particularly overjoyed with the
results.<BR><BR>Thank you Dinesh for the vote of sympathy. I do appreciate
it.<BR><BR>I did use Adobe Reader to convert the history PDF file into a text
file <BR>and it did seem to do it faithfully. So now I will work out a parsing
<BR>function to extract my data and send it to a SQLLITE database.<BR><BR>I am
thrilled both with the number of suggestions I have received from <BR>this group
and the quality of the suggestions.<BR><BR>Thanks again,<BR><BR>Robert
Berman<BR><BR></FONT></DIV></BODY></HTML>