[Python-Dev] The first trustworthy <wink> GBayes results
Delaney, Timothy
tdelaney@avaya.com
Mon, 2 Sep 2002 08:53:39 +1000
> From: Tim Peters [mailto:tim.one@comcast.net]
>
> Training GBayes is cheap, and the more you feed it the less need to do
> information-destroying transformations (like folding case or ignoring
> punctuation).
Speaking of which, I had a thought this morning (in the shower of course ;)
about a slightly more intelligent tokeniser.
Split on whitespace, then runs of punctuation at the end of "words" are
split off as a separate word.
So:
a.b.c -> 'a.b.c' (main use: keeps file extensions with filenames)
A phrase. -> 'A', 'phrase', '.'
WTF??? -> 'WTF', '???'
>>> import module -> '>>>', 'import', 'module'
Might this be useful? No code of course ;)
Tim Delaney