Using PIL to find separator pages
steve at holdenweb.com
Fri Jun 1 18:35:03 CEST 2007
Larry Bates wrote:
> Steve Holden wrote:
>> Larry Bates wrote:
>>> I have a project that I wanted to solicit some advice
>>> on from this group. I have millions of pages of scanned
>>> documents with each page in and individual .JPG file.
>>> When the documents were scanned the people that did
>>> the scanning put a colored (hot pink) separator page
>>> between the individual documents. I was wondering if
>>> there was any way to utilize PIL to scan through the
>>> individual files, look at some small section on the
>>> page, and determine if it is a separator page by
>>> somehow comparing the color to the separator page
>>> color? I realize that this would be some sort of
>>> percentage match where 100% would be a perfect match
>>> and any number lower would indicate that it was less
>>> likely that it was a coverpage.
>>> Thanks in advance for any thoughts or advice.
>> I suspect the easiest way would be to select a few small patches of each
>> image and average the color values of the pixels, then normalize to hue
>> rather than RGB.
>> Close enough to the hue you want (and you could include saturation and
>> intensity too, if you felt like it) across several areas of the page
>> would be a hit for a separator.
> I'm completely lost on how to proceed. I don't know how to average color
> values, normalize to hue... Any guidance you could give would be greatly
> Thanks in advance,
I'd like to help but I don't have any sample code to hand. Maybe someone
who does could give you more of a clue. Let's hope so, anyway ...
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
tagged items: del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------
More information about the Python-list