[Image-SIG] Filtering out all but black pixels for OCR

Karsten Hiddemann karsten.hiddemann at mathematik.uni-dortmund.de
Wed Jul 2 11:30:20 CEST 2008


Mike Meisner schrieb:
> I'd like to use PIL to prep an image file to improve OCR quality.
>  
> Specifically, I need to filter out all but black pixels from the image 
> (i.e., convert all non-black pixels to white while retaining the black 
> pixels).

You could do something like the following:

from PIL import Image

img = Image.open("sample.png")
(xdim, ydim) = img.size
# this assumes that no alpha-channel is set
black = (0, 0, 0)
white = (255, 255, 255)

if Image.VERSION >= "1.1.6":
	data = img.load()
	for y in range(ydim-1, 0, -1):
		for x in range(xdim):
			if data[x,y] != black:
				data[x,y] = white
else:
	data = img.getdata()
	for y in range(ydim-1, 0, -1):
		for x in range(xdim):
			if data[x+y*xdim] != black:
				data[x+y*xdim] = white

img.save("sample-filtered.png")


More information about the Image-SIG mailing list