unknown encoding us-ascii on Message.asTokens(), but
I use the asTokens method of a message to get a token list, and that sometimes fails with Traceback (most recent call last): File "mkwsbayes\interface.pyc", line 87, in sb_train File "mkwsbayes\UserContext.pyc", line 98, in train File "spambayes\message.pyc", line 187, in asTokens File "spambayes\message.pyc", line 199, in as_string File "email\Message.pyc", line 113, in as_string File "email\Generator.pyc", line 103, in flatten File "email\Generator.pyc", line 138, in _write File "email\Generator.pyc", line 172, in _write_headers File "email\Generator.pyc", line 44, in _is8bitstring LookupError: unknown encoding: us-ascii (note, I'm running this from a py2exe .zip so maybe I haven't included encodings or something.. ?) However if I don't use the asTokens() method on a message (instead I use hammie.score function, it works fine on the same message that asTokens fails on) why is there difference between asTokens() and not using tokens? Since I may need to unlearn this message, I'd prefer to tokenize only once. That's why I use asTokens. Any ideas on how to fix this? -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax http://www.wecanstopspam.org/ AOL-IM: BKClements
On 31 Jul 2003 at 17:22, Brad Clements wrote:
why is there difference between asTokens() and not using tokens?
Because Message.asTokens() first uses as_String to convert the message back to a string, then calls bayes.tokenize My solution is to just call bayes.tokenize -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax http://www.wecanstopspam.org/ AOL-IM: BKClements
participants (1)
-
Brad Clements