ImSim: Image Similarity

Sat Mar 5 12:10:42 EST 2011

n00m wrote:

> 
> I uploaded a new version of the subject with a
> VERY MINOR correction in it. Namely, in line #55:
> 
>     print '%12s %7.2f' % (db[k][1], db[k][0] / 3600.0,)
> 
> instead of
> 
>     print '%12s %7.2f' % (db[k][1], db[k][0] * 0.001,)
> 
> I.e. I normalized it to base = 100.
> Now the values of similarity can't be greater than 100
> and can be treated as some "regular" percents (%%).
> 
> Also, due to this change, the *empirical* threshold of
> "system alarmity" moved down from "number 70" to "20%".
> 
>   bears2.jpg
> --------------------
>   bears2.jpg    0.00
>   bears3.jpg   15.37
>   bears1.jpg   19.13
>     sky1.jpg   23.29
>     sky2.jpg   23.45
>      ff1.jpg   25.37
>    lake1.jpg   26.43
>   water1.jpg   26.93
>      ff2.jpg   28.43
>   roses1.jpg   31.95
>   roses2.jpg   36.12

I'd like to see a *lot* more structure in there, with modularization, so the 
internal functions could be used from another program.  Once I'd figured out 
what it was doing, I had this:

from PIL import Image
from PIL import ImageStat

def row_column_histograms (file_name):
    '''Reduce the image to a 5x5 square of b/w brightness levels 0..3
    Return two brightness histograms across Y and X
    packed into a 10-item list of 4-item histograms.'''
    im = Image.open (file_name)
    im = im.convert ('L')	# convert to 8-bit b/w
    w, h = 300, 300
    im = im.resize ((w, h))
    imst = ImageStat.Stat (im)
    sr = imst.mean[0]	# average pixel level in layer 0
    sr_low, sr_mid, sr_high = (sr*2)/3, sr, (sr*4)/3
    def foo (t):
        if t < sr_low: return 0
        if t < sr_mid: return 1
        if t < sr_high: return 2
        return 3
    im = im.point (foo)	# reduce to brightness levels 0..3
    yhist = [[0]*4 for i in xrange(5)]
    xhist = [[0]*4 for i in xrange(5)]
    for y in xrange (h):
        for x in xrange (w):
            k = im.getpixel ((x, y))
            yhist[y / 60][k] += 1
            xhist[x / 60][k] += 1
    return yhist + xhist

def difference_ranks (test_histogram, sample_histograms):
    '''Return a list of difference ranks between the test histograms and 
each of the samples.'''
    result = [0]*len (sample_histograms)
    for k, s in enumerate (sample_histograms):	# for each image
        for i in xrange(10):	# for each histogram slot
            for j in xrange(4):	# for each brightness level
                result[k] += abs (s[i][j] - test_histogram[i][j])	
    return result

if __name__ == '__main__':
    import getopt, sys
    opts, args = getopt.getopt (sys.argv[1:], '', [])
    if not args:
        args = [
            'bears1.jpg',
            'bears2.jpg',
            'bears3.jpg',
            'roses1.jpg',
            'roses2.jpg',
            'ff1.jpg',
            'ff2.jpg',
            'sky1.jpg',
            'sky2.jpg',
            'water1.jpg',
            'lake1.jpg',
        ]
        test_pic = 'bears2.jpg' 
    else:
        test_pic, args = args[0], args[1:]

    z = [row_column_histograms (a) for a in args]
    test_z = row_column_histograms (test_pic)

    file_ranks = zip (difference_ranks (test_z, z), args)	
    file_ranks.sort()

    print '%12s' % (test_pic,)
    print '--------------------'
    for r in file_ranks:
        print '%12s %7.2f' % (r[1], r[0] / 3600.0,)

(omitting a few comments that wrapped around.)  The test-case still agrees 
with your archived version:

mwilson at tecumseth:~/sandbox/im_sim$ python image_rank.py bears2.jpg *.jpg
  bears2.jpg
--------------------
  bears2.jpg    0.00
  bears3.jpg   15.37
  bears1.jpg   19.20
    sky1.jpg   23.20
    sky2.jpg   23.37
     ff1.jpg   25.30
   lake1.jpg   26.38
  water1.jpg   26.98
     ff2.jpg   28.43
  roses1.jpg   32.01

I'd vaguely wanted to do something like this for a while, but I never dug 
far enough into PIL to even get started.  An additional kind of ranking that 
takes colour into account would also be good -- that's the first one I never 
did.

	Cheers,		Mel.