[OPy] PDF page splitter
Eric Bruning
eric at deeplycloudy.com
Tue Nov 13 21:18:16 CET 2007
Since it was hard to see on the screen during the meeting today, I'm
passing along the code that splits a PDF into single pages. It's
designed around the idea of taking a single PDF containing lots of
single page letters addressed to different people with a "Dear
SoAndSo:" Change the regular expressions to meet your needs.
Required: http://pybrary.net/pyPdf/
I'd be happy to add this to the wiki, but wasn't sure where to put it.
-Eric
from pyPdf import PdfFileWriter, PdfFileReader
import re
filename = "input.pdf"
CommonFilePrefix = " "
isSalutationMatch = r"Dear (.+?):"
SalutationMatch = re.compile(isSalutationMatch)
original = PdfFileReader(file(filename, "rb"))
for pagenumber, page in enumerate(original.pages):
pagenumberNormal = pagenumber + 1
text = page.extractText()
FirstNameMatch = SalutationMatch.search(text)
if FirstNameMatch:
FirstName = FirstNameMatch.group(1)
isFullNameMatch = FirstName + r"\s*[a-zA-Z]+"
FullNameMatch = re.compile(isFullNameMatch)
SpecificFileSuffixMatch = FullNameMatch.search(text)
if SpecificFileSuffixMatch:
SpecificFileSuffix = SpecificFileSuffixMatch.group(0)
else:
SpecificFileSuffix = "page %i" % pagenumberNormal
else:
SpecificFileSuffix = "page %i" % pagenumberNormal
print pagenumberNormal, SpecificFileSuffix
output = PdfFileWriter()
output.addPage(page)
outputStream = file(CommonFilePrefix+SpecificFileSuffix+".pdf",
"wb")
output.write(outputStream)
outputStream.close()
More information about the OPy
mailing list