ImSim: Image Similarity
Mel
mwilson at the-wire.com
Sat Mar 5 12:10:42 EST 2011
n00m wrote:
>
> I uploaded a new version of the subject with a
> VERY MINOR correction in it. Namely, in line #55:
>
> print '%12s %7.2f' % (db[k][1], db[k][0] / 3600.0,)
>
> instead of
>
> print '%12s %7.2f' % (db[k][1], db[k][0] * 0.001,)
>
> I.e. I normalized it to base = 100.
> Now the values of similarity can't be greater than 100
> and can be treated as some "regular" percents (%%).
>
> Also, due to this change, the *empirical* threshold of
> "system alarmity" moved down from "number 70" to "20%".
>
> bears2.jpg
> --------------------
> bears2.jpg 0.00
> bears3.jpg 15.37
> bears1.jpg 19.13
> sky1.jpg 23.29
> sky2.jpg 23.45
> ff1.jpg 25.37
> lake1.jpg 26.43
> water1.jpg 26.93
> ff2.jpg 28.43
> roses1.jpg 31.95
> roses2.jpg 36.12
I'd like to see a *lot* more structure in there, with modularization, so the
internal functions could be used from another program. Once I'd figured out
what it was doing, I had this:
from PIL import Image
from PIL import ImageStat
def row_column_histograms (file_name):
'''Reduce the image to a 5x5 square of b/w brightness levels 0..3
Return two brightness histograms across Y and X
packed into a 10-item list of 4-item histograms.'''
im = Image.open (file_name)
im = im.convert ('L') # convert to 8-bit b/w
w, h = 300, 300
im = im.resize ((w, h))
imst = ImageStat.Stat (im)
sr = imst.mean[0] # average pixel level in layer 0
sr_low, sr_mid, sr_high = (sr*2)/3, sr, (sr*4)/3
def foo (t):
if t < sr_low: return 0
if t < sr_mid: return 1
if t < sr_high: return 2
return 3
im = im.point (foo) # reduce to brightness levels 0..3
yhist = [[0]*4 for i in xrange(5)]
xhist = [[0]*4 for i in xrange(5)]
for y in xrange (h):
for x in xrange (w):
k = im.getpixel ((x, y))
yhist[y / 60][k] += 1
xhist[x / 60][k] += 1
return yhist + xhist
def difference_ranks (test_histogram, sample_histograms):
'''Return a list of difference ranks between the test histograms and
each of the samples.'''
result = [0]*len (sample_histograms)
for k, s in enumerate (sample_histograms): # for each image
for i in xrange(10): # for each histogram slot
for j in xrange(4): # for each brightness level
result[k] += abs (s[i][j] - test_histogram[i][j])
return result
if __name__ == '__main__':
import getopt, sys
opts, args = getopt.getopt (sys.argv[1:], '', [])
if not args:
args = [
'bears1.jpg',
'bears2.jpg',
'bears3.jpg',
'roses1.jpg',
'roses2.jpg',
'ff1.jpg',
'ff2.jpg',
'sky1.jpg',
'sky2.jpg',
'water1.jpg',
'lake1.jpg',
]
test_pic = 'bears2.jpg'
else:
test_pic, args = args[0], args[1:]
z = [row_column_histograms (a) for a in args]
test_z = row_column_histograms (test_pic)
file_ranks = zip (difference_ranks (test_z, z), args)
file_ranks.sort()
print '%12s' % (test_pic,)
print '--------------------'
for r in file_ranks:
print '%12s %7.2f' % (r[1], r[0] / 3600.0,)
(omitting a few comments that wrapped around.) The test-case still agrees
with your archived version:
mwilson at tecumseth:~/sandbox/im_sim$ python image_rank.py bears2.jpg *.jpg
bears2.jpg
--------------------
bears2.jpg 0.00
bears3.jpg 15.37
bears1.jpg 19.20
sky1.jpg 23.20
sky2.jpg 23.37
ff1.jpg 25.30
lake1.jpg 26.38
water1.jpg 26.98
ff2.jpg 28.43
roses1.jpg 32.01
I'd vaguely wanted to do something like this for a while, but I never dug
far enough into PIL to even get started. An additional kind of ranking that
takes colour into account would also be good -- that's the first one I never
did.
Cheers, Mel.
More information about the Python-list
mailing list