Using PIL to find separator pages

Steve Holden steve at
Fri Jun 1 18:35:03 CEST 2007

Larry Bates wrote:
> Steve Holden wrote:
>> Larry Bates wrote:
>>> I have a project that I wanted to solicit some advice
>>> on from this group.  I have millions of pages of scanned
>>> documents with each page in and individual .JPG file.
>>> When the documents were scanned the people that did
>>> the scanning put a colored (hot pink) separator page
>>> between the individual documents.  I was wondering if
>>> there was any way to utilize PIL to scan through the
>>> individual files, look at some small section on the
>>> page, and determine if it is a separator page by
>>> somehow comparing the color to the separator page
>>> color?  I realize that this would be some sort of
>>> percentage match where 100% would be a perfect match
>>> and any number lower would indicate that it was less
>>> likely that it was a coverpage.
>>> Thanks in advance for any thoughts or advice.
>> I suspect the easiest way would be to select a few small patches of each
>> image and average the color values of the pixels, then normalize to hue
>> rather than RGB.
>> Close enough to the hue you want (and you could include saturation and
>> intensity too, if you felt like it) across several areas of the page
>> would be a hit for a separator.
>> regards
>>  Steve
> Steve,
> I'm completely lost on how to proceed.  I don't know how to average color
> values, normalize to hue...  Any guidance you could give would be greatly
> appreciated.
> Thanks in advance,
> Larry

I'd like to help but I don't have any sample code to hand. Maybe someone 
who does could give you more of a clue. Let's hope so, anyway ...

Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd 
Skype: holdenweb
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
tagged items:
All these services currently offer free registration!
-------------- Thank You for Reading ----------------

More information about the Python-list mailing list