Using PIL to find separator pages

Larry Bates larry.bates at websafe.com
Fri Jun 1 11:51:11 EDT 2007


Steve Holden wrote:
> Larry Bates wrote:
>> I have a project that I wanted to solicit some advice
>> on from this group.  I have millions of pages of scanned
>> documents with each page in and individual .JPG file.
>> When the documents were scanned the people that did
>> the scanning put a colored (hot pink) separator page
>> between the individual documents.  I was wondering if
>> there was any way to utilize PIL to scan through the
>> individual files, look at some small section on the
>> page, and determine if it is a separator page by
>> somehow comparing the color to the separator page
>> color?  I realize that this would be some sort of
>> percentage match where 100% would be a perfect match
>> and any number lower would indicate that it was less
>> likely that it was a coverpage.
>>
>> Thanks in advance for any thoughts or advice.
>>
> I suspect the easiest way would be to select a few small patches of each
> image and average the color values of the pixels, then normalize to hue
> rather than RGB.
> 
> Close enough to the hue you want (and you could include saturation and
> intensity too, if you felt like it) across several areas of the page
> would be a hit for a separator.
> 
> regards
>  Steve

Steve,

I'm completely lost on how to proceed.  I don't know how to average color
values, normalize to hue...  Any guidance you could give would be greatly
appreciated.

Thanks in advance,
Larry



More information about the Python-list mailing list