[OPy] PDF page splitter

Eric Bruning eric at deeplycloudy.com
Tue Nov 13 21:18:16 CET 2007


Since it was hard to see on the screen during the meeting today, I'm  
passing along the code that splits a PDF into single pages. It's  
designed around the idea of taking a single PDF containing lots of  
single page letters addressed to different people with a "Dear  
SoAndSo:" Change the regular expressions to meet your needs.

Required: http://pybrary.net/pyPdf/

I'd be happy to add this to the wiki, but wasn't sure where to put it.

-Eric



from pyPdf import PdfFileWriter, PdfFileReader
import re

filename = "input.pdf"
CommonFilePrefix = " "

isSalutationMatch = r"Dear (.+?):"
SalutationMatch = re.compile(isSalutationMatch)


original = PdfFileReader(file(filename, "rb"))


for pagenumber, page in enumerate(original.pages):

     pagenumberNormal = pagenumber + 1

     text = page.extractText()

     FirstNameMatch = SalutationMatch.search(text)
     if FirstNameMatch:
         FirstName = FirstNameMatch.group(1)
         isFullNameMatch = FirstName + r"\s*[a-zA-Z]+"
         FullNameMatch = re.compile(isFullNameMatch)

         SpecificFileSuffixMatch = FullNameMatch.search(text)
         if SpecificFileSuffixMatch:
             SpecificFileSuffix = SpecificFileSuffixMatch.group(0)
         else:
             SpecificFileSuffix = "page %i" % pagenumberNormal
     else:
         SpecificFileSuffix = "page %i" % pagenumberNormal

     print pagenumberNormal, SpecificFileSuffix


     output = PdfFileWriter()
     output.addPage(page)

     outputStream = file(CommonFilePrefix+SpecificFileSuffix+".pdf",  
"wb")
     output.write(outputStream)
     outputStream.close()




  


More information about the OPy mailing list