
From: Delaney, Timothy [mailto:tdelaney@avaya.com]
Whether any weighting should be applied to single words or word pairs I don't know - my gut feeling is that they should be weighted the same, but guts are no replacement for empirical evidence.
On second thought - if a word-pair appears, then the separate parts should not be checked as separate words. So, If I had scores: 'free' 0.1 'beer' 0.1 ('want', 'free',) 0.9 ('free', 'beer',) 0.01 ('free', '!!!',) 0.99 then the following phrases would match (case-folding) as: 'I want free beer!!!': ('want', 'free',) 0.9 ('free', 'beer',) 0.01 'Get *** for free!!!' ('free', '!!!',) 0.99 'I want free beer. Free the beer!!!' ('want', 'free',) 0.9 ('free', 'beer',) 0.01 'free' 0.1 'beer' 0.1 Damn I wish I was at home to try this out ... :( Tim Delaney