[Spambayes-checkins] spambayes/spambayes tokenizer.py,1.33,1.34

Fri Jan 21 05:41:42 CET 2005

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3790

Modified Files:
	tokenizer.py 
Log Message:
Work around a bug in the csv module.  It will happily write csv files in
which the elements contain \r characters but refuses to read them.  This has
been fixed in Python 2.5 but is still present in 2.3.4 and 2.4.0.  It's only
a problem for SpamBayes if you use sb_dbexpimp.py to export a database to a
csv file then later try to import it.  I believe this is the only place that
a fix is necessary because \r characters can only appear in tokens generated
from the Subject header.


Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.33
retrieving revision 1.34
diff -C2 -d -r1.33 -r1.34
*** tokenizer.py	29 Oct 2004 00:14:42 -0000	1.33
--- tokenizer.py	21 Jan 2005 04:41:40 -0000	1.34
***************
*** 1327,1330 ****
--- 1327,1333 ----
              if subjcharset is not None:
                  yield 'subjectcharset:' + subjcharset
+             # this is a workaround for a bug in the csv module in Python
+             # <= 2.3.4 and 2.4.0 (fixed in 2.5)
+             x = x.replace('\r', ' ')
              for w in subject_word_re.findall(x):
                  for t in tokenize_word(w):