[spambayes-dev] Ocrad vs Tesseract OCR

Tue Sep 5 05:38:36 CEST 2006

>     Tony> * The license is a bit vague, unfortunately.
>
>     skip> I suppose we ought to contact the authors, just to be on  
> the safe
>     skip> side.
>
> Perhaps it's not necessary.  The README file says:
>
>     This package contains the Tesseract Open Source OCR Engine.
>     Orignally developed at Hewlett Packard Laboratories Bristol and
>     at Hewlett Packard Co, Greeley Colorado, the majority of the code
>     in this distribution is now licensed under the Apache License:
[...]
> The Apache license is fine for our use, right?

Sigh.  I don't know how I missed that (right at the top of the  
README), and yet managed to read the bit later on.

> And built successfully, with a couple tweeks.  After a bit of  
> juggling, I
> got the executable into the proper spot, ran it, then got a segfault.
> Unfortunately, the README file includes this:
>
>     The C++ code makes heavy use of a list system using macros.  This
>     predates stl, was portable before stl, and is more efficent  
> than stl
>     lists, but has the big negative that if you do get a segmentation
>     violation, it is hard to debug.
>
> It's certainly not ready for prime time.

:(  Ah, well, it was worth a shot.  Thanks for doing the work!

When I find some time to do some proper evaluation of the new  
experimental options, I might try it as well, as see how I go (out of  
curiosity).  Were you building on OS X?

=Tony.Meyer