Errors with PyPdf
Dave Angel
davea at ieee.org
Mon Sep 27 00:46:18 EDT 2010
On 2:59 PM, flebber wrote:
> <snip>
> Traceback (most recent call last):
> File "C:/Python26/Pdfread", line 16, in<module>
> open('x.txt', 'w').write(content)
> NameError: name 'content' is not defined
> When i use.
>
> import pyPdf
>
> def getPDFContent(path):
> content =C:\Components-of-Dot-NET.txt"
> # Load PDF into pyPDF
> pdf =yPdf.PdfFileReader(file(path, "rb"))
> # Iterate pages
> for i in range(0, pdf.getNumPages()):
> # Extract text from page and add to content
> content +=df.getPage(i).extractText() + "\n"
> # Collapse whitespace
> content = ".join(content.replace(u"\xa0", " ").strip().split())
> return content
>
> print getPDFContent(r"C:\Components-of-Dot-NET.pdf").encode("ascii",
> "ignore")
> open('x.txt', 'w').write(content)
>
There's no global variable content, that was local to the function. So
it's lost when the function exits. it does return the value, but you
give it to print, and don't save it anywhere.
data = getPDFContent(r"C:\Components-of-Dot-NET.pdf").encode("ascii",
"ignore")
outfile = open('x.txt', 'w')
outfile.write(data)
close(outfile)
I used a different name to emphasize that this is *not* the same
variable as content inside the function. In this case, it happens to
have the same value. And if you used the same name, you could be
confused about which is which.
DaveA
More information about the Python-list
mailing list