tim> Straight character n-grams are very appealing because they're the tim> simplest and most language-neutral; I didn't have any luck with tim> them over the weekend, but the size of my training data was tim> trivial. Anybody up for pooling corpi (corpora?)? Skip