Hi friends, as posted elsewhere there is a NetPbm port for Windows. I think it would be beneficial for Windows users if it was possible to use that with SpamBayes instead of having to setup PIL/Python. I am trying to make ocrad run in combination with SpamBayes for Windows as well. First: The ocrad.exe in the download section is according to the version info really v. 0.14 and is exactly the same as the one in the windows installer package (and not v. 0.15 as tagged in the download area!). Secondly: Yes, the program runs from CMD and it actually produces very nice output, but the pnm-file you link to are apparently invalid. Third: You seem to have the scale parameter format wrong, that's why the output is so bad. Ok, here is how to test it: -Download IrfanView from http://www.irfanview.com/download_sites.htm -Download and open the pnm file from the link -Select Edit -> Copy then Edit -> Paste in the menu (to create a new, non-corrupted image) -Save as PGM (What I did, but other P*M formats should work too) -Now execute ocrad Here are two test runs. First one with incorrect parameter format as in your example. Second with correct scaling parameter: -----------------------------------------------------------------------------------------------------------------------------
ocrad -4 Clipboard01.pgm
___AllE_llO_ ALL DA_ IRADER_ A_D |__E_IOR____ _omO_n_ __m_ _O_TWW__T_R__ W_Dl_AL. lh_ __o_k __mbol _wnw Wond__ _lo__ o 11 _ol_m_ _.TSl.TO_ _h_nO_ _P o o__ (_T TB_/_) w__k__ __o ___.ooo.ooo 00 (AOO_o_) _WWM p_ R___A___ BR_A_IWC w_w_ | _o__hw______ M_dl_ol _ol__lo__ |__ _PIWW_W__T_ _WWM_ |_ Ol_o__d _o o__o____ _ho_ |_ ho_ ______d |__o _h_ _o_mol __oO__ o_ __Oo_lo_lo__ __Oo_dl_O Po_|_|_ Rlm _o_lo__ dl___lb__lo_ _o_ |__ O_oO_|__o__ _obO_o_d_TM_ O_od___ ||__ D__lnO _h_ |___ _ mon_h_. _o__hw______ M_dl_ol ho_ h_ld dl_____lo__ Oo_h___d do_o o_d __Olo__d _h_ O_o_O___|__ ||____|_o o_ |__ _obO_o_d_TMl dloO_o__|_ _____m _o b_ _old _h_o_Oh |_____o_lo_oll_ bo__d dl___lb__lo_ _omOo_|__ |__o Po_|_|_ Rlm o__o_ Th_ |_|_lol |_______ ho_ b___ ___ _h_ _ol_ o_ _h_ O_od___ _o_ Oo____m___ _mOlo___ ____|_O O_oO_om_ ___h o_ Ooll__ _o____ lo_ol o_d __o__ |___| wo____ o_d ol_o |__|_d__ hlOh__ |___| wo_Wo____ R_o_o__ _|__d _o_ _h_ |_______ o_ _h_ _obO_o_d_TM_ O_od___ ho__ b___ _h_ __ho___d _o__ o_ ___ o_ _h_ O_od___ o_d lmm_dlo__ o_ _|__ _o__|_mo_lo_ o_ w_ll o_ |____o__d __lloblll__ o_ _h_ ____|__ l_w___ _k__oRIr_ h_w_ '_Ir_ _ow WOl_L) ADD lwl_ _EW lo _O_R RADAR A_D WAl_W IRADE o_ WO_DA_| DO_.| E_E_ BLl__| _w_w DOE__.| _LEEpl RPmo_al _ Dl__lalmP_ _ ---------------------------------------------------------------------------------------------------------------------------------
ocrad -s4 Clipboard01.pgm
?wwATTENTION ALL DAy.TRADERS AND INVESTORSwww . . . Company Name: 3auTww5_T5RPd M_ar__L, Ýh'_L Stock Symbol: 3w__q Monday Close: 0.11 Volume: 5,761,702 Change: uP 0.025 (27.78░_o) Market Cap: $33,000,000.00 (Approx) SWNM,PK RELEASES BREAKING NEWS ! . Sou_hwes_ern Medical Solu_ions, Inc, (PINKSHEETS_ SWNM), is pleased _o announce _ha_ i_ has en_e?ed in_o _he formal s_aes of nego_ia_ions regarding Pacific Rim na_ions dis_ribu_ion for i_s proprie_ary Labguard(TM) produc_ line, . . DurÝng _he las_ s mon_hs, Sou_hwes_ern Medical has held discussions, ga_hered da_a, and explored _he prospec_ive licensing of i_s Labguard(TM) diagnos_ic sys_em, _o be sold _hrough in_erna_ionally based dis_ribu_ion companies in_o Pacific Rim a_eas, The ini_ial in_eres_ has been fur _he sale of _he produc_ for governmen_ employee _es_ing programs such as police forces, local and s_a_e level workers, and also includes higher-level wor_orces, Reasons ci_ed for _he in_eres_ of _he Labguard(TM) produc_ have been _he enhanced ease of use of _he produc_ and immedia_e on-si_e confirma_ion as well as increased reliabili_y of _he resul_s, . . __W__K _A__aRIr_ __w_ _Ir_ _aw Ma___3 -- ADD THIS GEM TO YOUR RADAR AND WATCH TRADE ON MONDAY! -- DON'T EVEN BLINK! -- SWNM DOESN'T SLEEP! . . Removal_ _ Disclaimer_ _ -- Best regards Vibe
Kasper> as posted elsewhere there is a NetPbm port for Windows. I think Kasper> it would be beneficial for Windows users if it was possible to Kasper> use that with SpamBayes instead of having to setup PIL/Python. Actually, no it wouldn't. I trust Fredrik Lundh much more to keep PIL happy on Windows than an unknown person to keep NetPBM happy on Windows. In addition, it's much easier for me to manipulate images in PIL than to use NetPBM to do it. And (here's the kicker), since I wrote the code, I get more votes than you. ;-) Kasper> First: The ocrad.exe in the download section is according to the Kasper> version info really v. 0.14 and is exactly the same as the one Kasper> in the windows installer package (and not v. 0.15 as tagged in Kasper> the download area!). The only ocrad code I've ever downloaded was 0.15. It's hard to see how I could have 0.14. When I extracted the tar file it created a directory named "ocrad-0.15", and the program_version variable in main.cc is declared as const char * const program_version = "0.15"; Unless you have more convincing evidence to the contrary I think I have actually been using version 0.15. That said, Tony Meyer actually built the version that I uploaded to SourceForge. It's possible he used 0.14. Kasper> Secondly: Yes, the program runs from CMD and it actually Kasper> produces very nice output, but the pnm-file you link to are Kasper> apparently invalid. The ocrad folks didn't open the file properly. To the best of my knowledge there isn't anything wrong with the pnm files I've used as examples. Kasper> Third: You seem to have the scale parameter format wrong, that's Kasper> why the output is so bad. The -2 I used was suggested by the ocrad author. I've tried a few other scales. It's not obvious that one scale is much better than another. Still, I should make this a user-settable option. Kasper> Here are two test runs. First one with incorrect parameter Kasper> format as in your example. Second with correct scaling Kasper> parameter: ... Whoops! Thanks, I'll fix that pronto. Skip
Hi,
Kasper> as posted elsewhere there is a NetPbm port for Windows. I think Kasper> it would be beneficial for Windows users if it was possible to Kasper> use that with SpamBayes instead of having to setup PIL/Python.
Actually, no it wouldn't. I trust Fredrik Lundh much more to keep PIL happy on Windows than an unknown person to keep NetPBM happy on Windows. In addition, it's much easier for me to manipulate images in PIL than to use NetPBM to do it. And (here's the kicker), since I wrote the code, I get more votes than you. ;-)
He he, of course that's your decision. And let me say you're doing a great job ;) Their forum seems active, but I don't know any of the authors. I don't know how cygwin works, but I guess, that if either method can be compiled into an exe-setup, then fine. I do think that having to install three separate applications will frighten some windows-users, but okay...
Kasper> First: The ocrad.exe in the download section is according to the Kasper> version info really v. 0.14 (..) Unless you have more convincing evidence to the contrary I think I have actually been using version 0.15. That said, Tony Meyer actually built the version that I uploaded to SourceForge. It's possible he used 0.14.
Ok, that's probable. Here I get - with the downloaded file in windows: C:\> ocrad -V GNU Ocrad version 0.14 Copyright (C) 2006 Antonio Diaz Diaz. This program is free software; you may redistribute it under the terms of the GNU General Public License. This program has absolutely no warranty. File size is exactly the same as the one posted with the exe-installer in August.
Kasper> Secondly: Yes, the program runs from CMD and it actually Kasper> produces very nice output, but the pnm-file you link to are Kasper> apparently invalid.
The ocrad folks didn't open the file properly. To the best of my knowledge there isn't anything wrong with the pnm files I've used as examples.
Yes, I think you might be right. I will do some more tests on this. -- Best regards Vibe
Vibe> Their forum seems active, but I don't know any of the authors. I Vibe> don't know how cygwin works, but I guess, that if either method Vibe> can be compiled into an exe-setup, then fine. I do think that Vibe> having to install three separate applications will frighten some Vibe> windows-users, but okay... Note also that PIL doesn't require forking processes. Note also that installing PIL on Windows is no different than installing SpamBayes or Python. Just double-click the installer's icon. Vibe> Ok, that's probable. Here I get - with the downloaded file in Vibe> windows: Vibe> C:\> ocrad -V Vibe> GNU Ocrad version 0.14 ... Okay, I'll have to presume Tony built with 0.14 sources. Tony? Skip
Hi there,
Vibe> ... I do think that Vibe> having to install three separate applications will frighten some Vibe> windows-users, but okay...
... Note also that installing PIL on Windows is no different than installing SpamBayes or Python. Just double-click the installer's icon.
I know, but in the beginning I did only use the SB exe. Only recently I did install the CygWin, Python, PIL etc. to see what it was like. Fact is I chose SpamBayes over other filters because a compiled exe was available. If it was only available as interpreted py sources I probably wouldn't have jumped in. Of course now I know how great it is ;) Actually I think useability is often overlooked by you wizards ;) I really wanted to install and compile with MinGW but I had to give up... couldn't figure out which of the ten or so different packages I had to download and how to use them...
Vibe> GNU Ocrad version 0.14 ... Okay, I'll have to presume Tony built with 0.14 sources. Tony?
I just discovered v. 0.16 sources at the ocrad site. Being my first CygWin experience I couldn't make the -nmo-cygwin switch work so I compiled one with the POSIX (?) emulation dll. I put it here for download. It will stay online until thursday. www.unlockaarhus.dk/dev/cygwin1.zip www.unlockaarhus.dk/dev/ocrad.zip Maybe some one could post it to the CVS if you find it appropriate? To use it download both files and extract in same folder. There is not much difference in the ocr output, but this version did not have any problem opening the pnm-files I tried. (I saw you mention elsewhere that this may have to do with POSIX / Win32 differences so maybe the dll is the reason this one works?) -- Best regards Vibe
>> ... Note also that installing PIL on Windows is no different than >> installing SpamBayes or Python. Just double-click the installer's >> icon. Vibe> I know, but in the beginning I did only use the SB exe. Only Vibe> recently I did install the CygWin, Python, PIL etc. to see what it Vibe> was like. Cygwin's not needed. That's only necessary for people who need/want to compile ocrad. Python is included with the Outlook plugin binary installer. You'd only need to install Python if you wanted to install it from source. In the long run (once the current crop of problems are worked out) I'll see what I can do to make a separate PIL installation unnecessary. Vibe> Fact is I chose SpamBayes over other filters because a compiled Vibe> exe was available. If it was only available as interpreted py Vibe> sources I probably wouldn't have jumped in. Of course now I know Vibe> how great it is ;) At this point in time you're working with alpha-level software. (New functionality, likely to contain bugs, etc.) You are more than welcome to wait for better integration. Vibe> Actually I think useability is often overlooked by you wizards ;) Vibe> I really wanted to install and compile with MinGW but I had to Vibe> give up... couldn't figure out which of the ten or so different Vibe> packages I had to download and how to use them... I have no idea how to do anything with mingw. >> Okay, I'll have to presume Tony built with 0.14 sources. Tony? Vibe> I just discovered v. 0.16 sources at the ocrad site. Being my Vibe> first CygWin experience I couldn't make the -nmo-cygwin switch Vibe> work so I compiled one with the POSIX (?) emulation dll. Vibe> I put it here for download. It will stay online until thursday. Vibe> www.unlockaarhus.dk/dev/cygwin1.zip Vibe> www.unlockaarhus.dk/dev/ocrad.zip Vibe> Maybe some one could post it to the CVS if you find it Vibe> appropriate? To use it download both files and extract in same Vibe> folder. I'll take a look when I have a few minutes. Thx, Skip
Kasper> Third: You seem to have the scale parameter format wrong, that's Kasper> why the output is so bad. Skip> The -2 I used was suggested by the ocrad author. I've tried a few Skip> other scales. It's not obvious that one scale is much better than Skip> another. Still, I should make this a user-settable option. Kasper> Here are two test runs. First one with incorrect parameter Kasper> format as in your example. Second with correct scaling Kasper> parameter: ... Skip> Whoops! Thanks, I'll fix that pronto. Belay that. Here's what's in the SpamBayes source: scale = options["Tokenizer", "ocrad_scale"] or 1 charset = options["Tokenizer", "ocrad_charset"] ... ocr = os.popen("ocrad -s %s -c %s -x %s -f %s 2>/dev/null" % (scale, charset, orf, pnmfile)) So I already allow the user to adjust the scaling factor and properly use the -s flag. I think the incorrect usage was confined to my postings. Skip
Hi again,
Kasper> Here are two test runs. First one with incorrect parameter Kasper> format as in your example. Second with correct scaling Kasper> parameter: ... Skip> Whoops! Thanks, I'll fix that pronto.
Belay that. Here's what's in the SpamBayes source:
scale = options["Tokenizer", "ocrad_scale"] or 1 charset = options["Tokenizer", "ocrad_charset"] ... ocr = os.popen("ocrad -s %s -c %s -x %s -f %s 2>/dev/null" % (scale, charset, orf, pnmfile))
So I already allow the user to adjust the scaling factor and properly use the -s flag. I think the incorrect usage was confined to my postings.
Ok, I see. What is the meaning of the last '2' in the os.popen()-call? If I send that to ocrad after the input-file I get the ocr followed by 'Cannot open 2'. Also on my system I get different output formats with those calls: ocrad -s2 -x ocr.txt Clipboard01.pgm ocrad -s2 > ocr.txt Clipboard01.pgm Of course I trust you know which one to use. Just telling you this in case it is something that was overlooked. I will continue testing until we get this working on Windows. -- Best regards Vibe
On Mon, Oct 23, 2006 at 06:13:51PM +0200, Vibe Grevsen wrote: || Hi again, || || > Kasper> Here are two test runs. First one with incorrect parameter || > Kasper> format as in your example. Second with correct scaling || > Kasper> parameter: || > ... || > Skip> Whoops! Thanks, I'll fix that pronto. || > || > Belay that. Here's what's in the SpamBayes source: || > || > scale = options["Tokenizer", "ocrad_scale"] or 1 || > charset = options["Tokenizer", "ocrad_charset"] || > ... || > ocr = os.popen("ocrad -s %s -c %s -x %s -f %s 2>/dev/null" % || > (scale, charset, orf, pnmfile)) || > || > So I already allow the user to adjust the scaling factor and properly use || > the -s flag. I think the incorrect usage was confined to my postings. || || Ok, I see. || || What is the meaning of the last '2' in the os.popen()-call? || If I send that to ocrad after the input-file I get the ocr followed by 'Cannot open 2'. 2>/dev/null is /bin/sh syntax for redirecting standard error to /dev/null. Csh or a derivative will not recognize 2> as special, but pass 2 as argument to ocrad. Then ocrad will interpret it as a filename and try to open that file, with the shown consequence. -- Vincent Zweije <zweije@xs4all.nl> | "If you're flamed in a group you <http://www.xs4all.nl/~zweije/> | don't read, does anybody get burnt?" [Xhost should be taken out and shot] | -- Paul Tomblin on a.s.r.
>> ocr = os.popen("ocrad -s %s -c %s -x %s -f %s 2>/dev/null" % Vibe> What is the meaning of the last '2' in the os.popen()-call? It's a Unix-ism that will probably not work on Windows. It sends error messages to the bit bucket. Vibe> If I send that to ocrad after the input-file I get the ocr Vibe> followed by 'Cannot open 2'. Yeah, I'll have to fix that. Recall that I don't have Windows available. I have to rely on folks like you to give me feedback. Eventually I hope one of the other SpamBayes developers will find some free time to work on it. I am going to post a solicitation to comp.lang.python looking for another volunteer though as well. Skip
Hi friends, as promised I'm continuing my tests on implementing OCR under Windows. FYI I'm running from sources recently downloaded through CVS.
ocr = os.popen("ocrad -s %s -c %s -x %s -f %s 2>/dev/null" %
Vibe> What is the meaning of the last '2' in the os.popen()-call?
It's a Unix-ism that will probably not work on Windows. It sends error messages to the bit bucket.
Ok, I did a little read-up on this. 2> is supported by WinNT, 2k and XP I just newer saw it used before. 2> is not supported in Win9x and ME. However /dev/null is - of course - not found in Windows. Equivalent is nul (case insensitive). Better use os.path.devnull like shown here. Parenthesis required for string formatting! ocr = os.popen( ( "ocrad -s %s -c %s -x %s < %s 2>" + os.path.devnull ) % (scale, charset, orf, pnmfile)) Now the surprise is that this executes 100% correctly from the interpreter, but it does not when spambayes runs. I still need to check up on exactly what is going on in Spambayes here. Maybe you could hint on other parts of the sources I should check for the next lead? Finally I was surprised to find that ocrad -s4 -x out.txt >ocr.txt logo.pgm did produce an ocr.txt but no out.txt for this image http://www.unlockaarhus.dk/dev/logo.pgm. Maybe it's only a problem with small images? Could you please test if this is the case under Unix as well? Happy coding :) Vibe
On Thu, 2006-11-02 at 15:23 +0100, Vibe Grevsen wrote:
Hi friends,
as promised I'm continuing my tests on implementing OCR under Windows. FYI I'm running from sources recently downloaded through CVS.
ocr = os.popen("ocrad -s %s -c %s -x %s -f %s 2>/dev/null" %
Vibe> What is the meaning of the last '2' in the os.popen()-call?
It's a Unix-ism that will probably not work on Windows. It sends error messages to the bit bucket.
Ok, I did a little read-up on this.
2> is supported by WinNT, 2k and XP I just newer saw it used before. 2> is not supported in Win9x and ME.
However /dev/null is - of course - not found in Windows. Equivalent is nul (case insensitive). Better use os.path.devnull like shown here. Parenthesis required for string formatting!
ocr = os.popen( ( "ocrad -s %s -c %s -x %s < %s 2>" + os.path.devnull ) % (scale, charset, orf, pnmfile))
or better use os.popen3 and discard stderr output. On windows you have to put quote around pnmfile to protect against space in path (also un linux you should have them but it's unlikely you get a path with a space). On windows there is also an other caveat. you should put quote also around ocrad path but if you do that you have to quote everything. to explain the command should be: ocr_cmd = r'""ocrad_path" -s %s -c %s "%s""'%(scale, charset, pnmfile) fin, fout, ferrr = os.popen3(ocr_cmd) but that doesn't work on linux. If you quote only ocrad_path or pnmfile you don't need the quote around the command as a whole. you may resolve the thing (as you have done) putting ocrad in the path and non quoting it. it this case you need to quote only pnmfile and it works on both linux and windows.
Now the surprise is that this executes 100% correctly from the interpreter, but it does not when spambayes runs. I still need to check up on exactly what is going on in Spambayes here.
Maybe you could hint on other parts of the sources I should check for the next lead?
Finally I was surprised to find that
ocrad -s4 -x out.txt >ocr.txt logo.pgm
did produce an ocr.txt but no out.txt for this image http://www.unlockaarhus.dk/dev/logo.pgm.
Maybe it's only a problem with small images? Could you please test if this is the case under Unix as well?
using -s (and other flags as well) disable -x. orf file is never used. probably is there from the start before skip introduce the scale parameter
Happy coding :)
Vibe _______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
-- Luigi Pugnetti Symbolic S.p.A. V.le Mentana, 29 I-43100 Parma Italy Tel: +39 0521 708811 Fax: +39 0521 776190
Thanks Luigi & Vibe for the feedback. I'll try to make those changes to the code this evening and check into CVS. Skip
Okay, I'm finally actually editing the necessary files. 2> is supported by WinNT, 2k and XP I just newer saw it used before. 2> is not supported in Win9x and ME. I don't think we care about Win9x or WinME (though someone should feel free to demonstrate my ignorance here). >> However /dev/null is - of course - not found in Windows. Equivalent >> is nul (case insensitive). Better use os.path.devnull like shown >> here. Parenthesis required for string formatting! Correct. Will be checked in shortly. Luigi> On windows you have to put quote around pnmfile to protect Luigi> against space in path ... Not a problem here, since the pnmfile is named using the tempfile.mkstemp function. It won't contain any characters which require special treatment. >> Finally I was surprised to find that >> >> ocrad -s4 -x out.txt >ocr.txt logo.pgm >> >> did produce an ocr.txt but no out.txt for this image >> http://www.unlockaarhus.dk/dev/logo.pgm. >> >> Maybe it's only a problem with small images? Could you please test if >> this is the case under Unix as well? Luigi> using -s (and other flags as well) disable -x. Hmmm... That sucks. I see the lines in ocrad's code where that happens. I mailed a note to bug-ocrad asking why this is so. Hopefully it's just a simple bug that can be squashed. Luigi> orf file is never used. probably is there from the start before Luigi> skip introduce the scale parameter Actually, yes, it is used: for line in open(orf): if line.startswith("lines"): nlines = int(line.split()[1]) if nlines: ctokens.add("image-text-lines:%d" % int(log2(nlines))) so no image-text-lines:NN tokens are generated. Skip
>>> However /dev/null is - of course - not found in Windows. Equivalent >>> is nul (case insensitive). Better use os.path.devnull like shown >>> here. Parenthesis required for string formatting! skip> Correct. Will be checked in shortly. More Windows-friendly executable location and program execution has been checked in for ImageStripper.py. Luigi> orf file is never used. probably is there from the start before Luigi> skip introduce the scale parameter skip> Actually, yes, it is used: skip> for line in open(orf): skip> if line.startswith("lines"): skip> nlines = int(line.split()[1]) skip> if nlines: skip> ctokens.add("image-text-lines:%d" % skip> int(log2(nlines))) skip> so no image-text-lines:NN tokens are generated. But it seemed better to use the count() suggestion, so I did. Now we don't care of the -s/-x thing in ocrad is a bug or not. ;-) Skip
Hi again, good news - I fiddled a bit more and got it working under Windows :) :) :)
ocr = os.popen( ( "ocrad -s %s -c %s -x %s < %s 2>" + os.path.devnull ) % (scale, charset, orf, pnmfile))
or better use os.popen3 and discard stderr output.
os.popen3() does not seem to support the read()-method?
On windows you have to put quote around pnmfile to protect against space in path (also un linux you should have them but it's unlikely you get a path with a space).
Oh, YES, you're absolutely right. Thank you for this suggestion.
On windows there is also an other caveat. you should put quote also around ocrad path but if you do that you have to quote everything. to explain the command should be:
ocr_cmd = r'""ocrad_path" -s %s -c %s "%s""'%(scale, charset, pnmfile) fin, fout, ferrr = os.popen3(ocr_cmd)
I tested your suggestion, but it seemed to resolve wrong in the interpreter. Also popen3() could not be read() so I changed it a bit # u: unicode support, r: raw string ocr_cmd = ur'ocrad -s %s -c %s "%s"' % (scale, charset, pnmfile) ocr = os.popen( ocr_cmd ) I also tested this # u: unicode support, r: raw string ocr_cmd = ( ur'ocrad -s %s -c %s < "%s" 2>' + os.path.devnull ) % (scale, charset, pnmfile) ocr = os.popen( ocr_cmd ) Both working in windows so Skip can pick whichever he likes best ;)
Maybe you could hint on other parts of the sources I should check for the next lead?
With the above change I only had to do one more thing... Comment out the check for ocrad, then OCR is working. (Assuming ocrad 0.16 is in the path.) This means that we should probably work on testing the find_program and is_executable procedures. As soon as they are finished I could probably start on a new exe-installer-version. I think I figured how to include PIL in the exe aswell.
ocrad -s4 -x out.txt >ocr.txt logo.pgm did produce an ocr.txt but no out.txt for this image http://www.unlockaarhus.dk/dev/logo.pgm.
using -s (and other flags as well) disable -x.
Hmm, bug, no, better undocumented feature? :) (At least it's not explained in the ocrad readme as far as I can see...)
orf file is never used. probably is there from the start before skip introduce the scale parameter
Actually he tries to count the number of lines in orf I think for line in open(orf): ... But this could of course be done directly on ocr.read(). Happy coding :) Vibe
On Fri, 2006-11-03 at 02:10 +0100, Vibe Grevsen wrote:
Hi again,
good news - I fiddled a bit more and got it working under Windows :) :) :)
Hello,
ocr = os.popen( ( "ocrad -s %s -c %s -x %s < %s 2>" + os.path.devnull ) % (scale, charset, orf, pnmfile))
or better use os.popen3 and discard stderr output.
os.popen3() does not seem to support the read()-method?
Are you sure? I have used It on at least a couple of version of python. If you look into the code snippet I sent yesterday you may see few line commented calling ocrad using popen3 and it works (at least on of a few computers I tested it)
On windows you have to put quote around pnmfile to protect against space in path (also un linux you should have them but it's unlikely you get a path with a space).
Oh, YES, you're absolutely right. Thank you for this suggestion.
On windows there is also an other caveat. you should put quote also around ocrad path but if you do that you have to quote everything. to explain the command should be:
ocr_cmd = r'""ocrad_path" -s %s -c %s "%s""'%(scale, charset, pnmfile) fin, fout, ferrr = os.popen3(ocr_cmd)
I tested your suggestion, but it seemed to resolve wrong in the interpreter. Also popen3() could not be read() so I changed it a bit
Strange. It's work for me
# u: unicode support, r: raw string ocr_cmd = ur'ocrad -s %s -c %s "%s"' % (scale, charset, pnmfile) ocr = os.popen( ocr_cmd )
I also tested this
# u: unicode support, r: raw string ocr_cmd = ( ur'ocrad -s %s -c %s < "%s" 2>' + os.path.devnull ) % (scale, charset, pnmfile) ocr = os.popen( ocr_cmd )
Both working in windows so Skip can pick whichever he likes best ;)
Does 2> really work? I think it kind of works because it's ignored by cmd.exe but it's a unix sh construct (I'm not really sure if it even works using csh derived shells)
Maybe you could hint on other parts of the sources I should check for the next lead?
With the above change I only had to do one more thing... Comment out the check for ocrad, then OCR is working. (Assuming ocrad 0.16 is in the path.)
This means that we should probably work on testing the find_program and is_executable procedures. As soon as they are finished I could probably start on a new exe-installer-version. I think I figured how to include PIL in the exe aswell.
ocrad -s4 -x out.txt >ocr.txt logo.pgm did produce an ocr.txt but no out.txt for this image http://www.unlockaarhus.dk/dev/logo.pgm.
using -s (and other flags as well) disable -x.
Hmm, bug, no, better undocumented feature? :)
(At least it's not explained in the ocrad readme as far as I can see...)
orf file is never used. probably is there from the start before skip introduce the scale parameter
Actually he tries to count the number of lines in orf I think no, he looks for line starting with line that probably it's related to
oh yes, you have to look inside the source code to find it the number of line in the output.
for line in open(orf): ...
But this could of course be done directly on ocr.read().
Happy coding :)
Vibe _______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
-- Luigi Pugnetti Symbolic S.p.A. V.le Mentana, 29 I-43100 Parma Italy Tel: +39 0521 708811 Fax: +39 0521 776190
participants (6)
-
Kasper Vibe Grevsen -
Luigi Pugnetti -
Skip Montanaro -
skip@pobox.com -
Vibe Grevsen -
vzweije@zweije.nl.eu.org