Using PIL to find separator pages
Larry Bates
larry.bates at websafe.com
Fri Jun 1 11:51:11 EDT 2007
Steve Holden wrote:
> Larry Bates wrote:
>> I have a project that I wanted to solicit some advice
>> on from this group. I have millions of pages of scanned
>> documents with each page in and individual .JPG file.
>> When the documents were scanned the people that did
>> the scanning put a colored (hot pink) separator page
>> between the individual documents. I was wondering if
>> there was any way to utilize PIL to scan through the
>> individual files, look at some small section on the
>> page, and determine if it is a separator page by
>> somehow comparing the color to the separator page
>> color? I realize that this would be some sort of
>> percentage match where 100% would be a perfect match
>> and any number lower would indicate that it was less
>> likely that it was a coverpage.
>>
>> Thanks in advance for any thoughts or advice.
>>
> I suspect the easiest way would be to select a few small patches of each
> image and average the color values of the pixels, then normalize to hue
> rather than RGB.
>
> Close enough to the hue you want (and you could include saturation and
> intensity too, if you felt like it) across several areas of the page
> would be a hit for a separator.
>
> regards
> Steve
Steve,
I'm completely lost on how to proceed. I don't know how to average color
values, normalize to hue... Any guidance you could give would be greatly
appreciated.
Thanks in advance,
Larry
More information about the Python-list
mailing list