Errors with PyPdf

Sun Sep 26 19:38:39 EDT 2010

On Sep 26, 7:10 pm, flebber <flebber.c... at gmail.com> wrote:
> I was trying to use Pypdf following a recipe from the Activestate
> cookbooks. However I cannot get it too work. Unsure if it is me or it
> is beacuse sets are deprecated.
>
> I have placed a pdf in my C:\ drive. it is called "Components-of-Dot-
> NET.pdf" You could use anything I was just testing with it.
>
> I was using the last script on that page that was most recently
> updated. I am using python 2.6.
>
> http://code.activestate.com/recipes/511465-pure-python-pdf-to-text-co...
>
> import pyPdf
>
> def getPDFContent(path):
>     content = "C:\Components-of-Dot-NET.pdf"
>     # Load PDF into pyPDF
>     pdf = pyPdf.PdfFileReader(file(path, "rb"))
>     # Iterate pages
>     for i in range(0, pdf.getNumPages()):
>         # Extract text from page and add to content
>         content += pdf.getPage(i).extractText() + "\n"
>     # Collapse whitespace
>     content = " ".join(content.replace(u"\xa0", " ").strip().split())
>     return content
>
> print getPDFContent("Components-of-Dot-NET.pdf").encode("ascii",
> "ignore")
>
> This is my error.
>
>
>
> Warning (from warnings module):
>   File "C:\Documents and Settings\Family\Application Data\Python
> \Python26\site-packages\pyPdf\pdf.py", line 52
>     from sets import ImmutableSet
> DeprecationWarning: the sets module is deprecated
>
> Traceback (most recent call last):
>   File "C:/Python26/Pdfread", line 15, in <module>
>     print getPDFContent("Components-of-Dot-NET.pdf").encode("ascii",
> "ignore")
>   File "C:/Python26/Pdfread", line 6, in getPDFContent
>     pdf = pyPdf.PdfFileReader(file(path, "rb"))
---> IOError: [Errno 2] No such file or directory: 'Components-of-Dot-
> NET.pdf'
>
>
>
>
Looks like a issue with finding the file.
how do you pass the path?