Errors with PyPdf

Dave Angel davea at ieee.org
Mon Sep 27 06:46:18 CEST 2010



On 2:59 PM, flebber wrote:
> <snip>
> Traceback (most recent call last):
>    File "C:/Python26/Pdfread", line 16, in<module>
>      open('x.txt', 'w').write(content)
> NameError: name 'content' is not defined
> When i use.
>
> import pyPdf
>
> def getPDFContent(path):
>      content =C:\Components-of-Dot-NET.txt"
>      # Load PDF into pyPDF
>      pdf =yPdf.PdfFileReader(file(path, "rb"))
>      # Iterate pages
>      for i in range(0, pdf.getNumPages()):
>          # Extract text from page and add to content
>          content +=df.getPage(i).extractText() + "\n"
>      # Collapse whitespace
>      content = ".join(content.replace(u"\xa0", " ").strip().split())
>      return content
>
> print getPDFContent(r"C:\Components-of-Dot-NET.pdf").encode("ascii",
> "ignore")
> open('x.txt', 'w').write(content)
>
There's no global variable content, that was local to the function.  So 
it's lost when the function exits.  it does return the value, but you 
give it to print, and don't save it anywhere.

data = getPDFContent(r"C:\Components-of-Dot-NET.pdf").encode("ascii",
"ignore")

outfile = open('x.txt', 'w')
outfile.write(data)

close(outfile)

I used a different name to emphasize that this is *not* the same 
variable as content inside the function.  In this case, it happens to 
have the same value.  And if you used the same name, you could be 
confused about which is which.


DaveA




More information about the Python-list mailing list