Extract a bordered, skewed rectangle from an image

Fri May 7 17:41:16 EDT 2010

"Paul Hemans" <darwin at nowhere.com> writes:

> I am wondering whether there are any people here that have experience with 
> openCV and Python. If so, could you either give me some pointers on how to 
> approach this, or if you feel so inclined, bid on the project. There are 2 
> problems:

Can't offer actual services, but I've done image tracking and object
identification in Python with OpenCV so can suggest some approaches.

You might also try the OpenCV mailing list, though it's sometimes
varies wildly in terms of S/N ratio.

And for OpenCV specifically, I definitely recommend the book "Learning
OpenCV" by O'Reilly.  It's really hard to grasp the concepts and
applications of the raw OpenCV calls from the API documentation, and I
found the book (albeit not cheap) helped me out tremendously and was
well worth it.

I'll flip the two questions since the second is quicker to answer.

> How to do this through Python into openCV? I am a newbie to Python, not 
> strong in Maths and ignorant of the usage of openCV.

After trying a few wrappers, the bulk of my experience is with the
ctypes-opencv wrapper and OpenCV 1.x (either 1.0 or 1.1pre).  Things
change a lot with the recent 2.x (which needs C++ wrappers), and I'm
not sure the various wrappers are as stable yet.  So if you don't have
a hard requirement for 2.x, I might suggest at least starting with 1.x
and ctypes-opencv, which is very robust, though I'm a little biased as
I've contributed code to the wrapper.

> How do I get openCV to ignore the contents of the label and just focus on 
> the border?

There's likely no single answer, since multiple mechanisms for
identifying features in an image exist, and you can also derive
additional heuristics based on your own knowledge of the domain space
(your own images).  Without knowing exactly what the border design to
make it easy to detect is, it's hard to say anything definitive.

But in broad strokes, you'll often:

  1. Normalize the image in some way.  This can be to adjust for
     brightness from various scans to make later processing more
     consistent, or to switch spaces (to make color matching more
     effective) or even to remove color altogether if it just
     complicates matters.  You may also mask of entire portions of the
     image if you have information that says they can't possibly be
     part of what you are looking for.
  2. Attempt to remove noise.  Even when portions of an image looks
     like a solid color, at the pixel level there can be may different
     variations in pixel values.  Operations such as blurring or
     smoothing help to average out those values and simplify matching
     entire regions.
  3. Attempt to identify the regions or features of interest.  Here's
     where a ton of algorithms may apply due to your needs, but the
     simplest form to start with is basic color matching.  For edge
     detection (like of your label) convolutions (such as gradient
     detection) might also ideal.  
  4. Process identified regions to attempt to clean them up, if
     possible weakening regions likely to be extraneous, and
     strengthening those more likely to be correct.  Morphology
     operations are one class of processing likely to help here.
  5. Select among features (if more than one) to identify the best
     match, using any knowledge you may have that can be used to
     rank them (e.g., size, position in image, etc...)

My own processing is ball tracking in motion video, so I have some
additional data in terms of adjacent frames that helps me remove
static background information and minimize the regions under
consideration for step 3, but a single image probably won't have
that.  But given that you have scanned documents, there may be other
simplifying rules you can use, like eliminating anything too white or
too black (depending on label color).

My own flow works like:

1. Normalize each frame

   1. Blur the frame (cvSmooth with CV_BLUR, 5x5 matrix).  This
      smooths out the pixel values, improving the color conversion.
   2. Balance brightess (in RGB space).  I ended up just offsetting
      the image a fixed (x,x,x) value to maximize the RGB values.
      Found it worked better doing it in RGB before Lab conversion.
   3. Convert the image to the "Lab" color space.  I used Lab because
      the conversion process was fastest, but when frame rate isn't
      critical, HLS is likely better since hue/saturation are
      completely separate from lightness which makes for easier color
      matching.

2. Identify uninteresting regions in the current frame

   This may not apply to you, but here is where I mask out static
   information from prior background frames, based on difference
   calculations with the current frame, or very dark areas that I
   knew couldn't include what I was interested in.

   In your case, for example, if you know the label is going to show
   up fairly saturated (say it's a solid red or something), you could
   probably eliminate everything that is below a certain saturation
   level.  Or if they are black and white documents, but the label has
   a color, it might be very easy to filter out everything but the
   label.

   If you're lucky, some simple heuristics applied here might have the
   net effect of masking the majority of your document image away,
   leaving primarily the label.

3. Color matching

   1. Mask off regions of the image not falling within a specific Lab
      pixel range, sufficient to encompass my object under a variety of
      lighting/camera conditions.  I typically use cvInRangeS to set
      the mask bits for pixels within the range.
   2. Perform an erosion/dilation process - cvMorphologyEx against the
      mask as CV_MOP_CLOSE.  What this does is apply an erosion
      followed by a dilation.  The erosion removes very small features
      (likely unnecessary matches) while the dilation combines nearby
      features with each other.  The net effect is to strengthen
      larger matched areas (and help them become contiguous) while
      removing tiny features.  

   Note in my case I was looking for a relatively solid color ball (it
   had gaps since it was a whiffle ball), so if, for example, your
   label is alternating colors, or dashed lines or something like that
   it might not work as well.  There are more complicated algorithms
   that can match more elaborate patterns, sometimes with initial
   training on target images.

4. Object selection

   1. Locate all top level contours of any remaining solid areas
      in the mask (cvFindContours).  This will identify connected
      areas in the mask, so in your case, ideally one of the located
      contours would be the label edge.  This does assume that your
      feature identification in the prior step is likely to create
      contiguous areas.  Even just a few pixels of gaps will net a
      non-closed contour which is harder to work with, though the
      morphology operation will sometimes close those gaps.
   2. Evaluate "best" contour when multiple choices exist.  Very small
      areas are eliminated, and remaining areas are evaluated for
      average Lab value distance from a target point (somewhat
      arbitrarily chosen at this point to represent the "ideal" ball).
      The nearest (in color distance) contour is picked, except in the
      case of two "close" contours where the further contour can win
      if it is at least 4x (arbitrarily chosen) as large.  In your
      case, for example, any contours located within the label itself
      would necessarily be smaller than the label, so you could
      probably just pick largest.  Also, when calling cvFindContours
      you can prevent it from finding "interior" contours.
   3. Compute and return a minimum bounding circle (center, radius)
      for the selected contour.  In your case, you'd likely just use
      the contour itself - you can use the contour (with 'n' line
      segments) as is, or convert into an approximate polygon.

The nice thing about Python with OpenCV is the interactive
experimentation you can do right in the interpreter.  Open a highgui
window, load in your image and then experiment.  After performing
various processes, just quickly show the new image in the existing or
a new window.  You can keep several windows up to date when you test
process an image through several transforms to see the results.

Hope this at least gives you some thoughts as to how to proceed.

-- David