[Tutor] Reading big files
alan.gauld@bt.com
alan.gauld@bt.com
Wed, 24 Nov 1999 12:00:04 -0000
> I have a PDF file I want to read. I used the following code.
>
> f=open('D:\\Documents\\MyFile')
You'll need to open it in binary mode:
f=open('D:\\Documents\\MyFile', 'rb')
Actually I assume your file ends in .pdf so you'll need that too!
f=open('D:\\Documents\\MyFile.PDF', 'rb')
Make sure explorer view settings are set to show the full
filename, the default hides known extensions....
Then you'll need to find a reference for the internal format of
the PDF file and decode the binary bytes that you read.
> Where I am wrong. Is the file too big. Has it a special
> character in it that doesn't allow to read the entire content.
It probably has some kind of binary character combination
that looks like EOF to Python.
One thing to try is open the file using debug from a DOS prompt:
C:> debug foo.pdf
Use 'd' at the prompt to dump a listing of the file in hex.
There is also an ASCII listing on the right. Compare the
characters on the right with the hex patterns on the left
- that may help you decode the PDF format sufficiently
well to extract the text you want....
Alan G.