technique to enter text using a mobile phone keypad (T9 dictionary-based disambiguation)
John Machin
sjmachin at lexicon.net
Wed Aug 9 18:54:30 EDT 2006
bearophileHUGS at lycos.com wrote:
> I've tested that sorting just the strings instead of the tuples (and
> removing the stripping) reduces the running time enough:
>
> def __init__(self):
> numbers = '22233344455566677778889999'
> conv = string.maketrans(string.lowercase, numbers)
> lines =
> file("/usr/share/dict/words").read().lower().splitlines()
> # lines = map(str.strip, lines)
> lines.sort()
> self.dict = [(word.translate(conv), word) for word in lines]
>
> If the words file is already sorted you can skip the sorting line.
> If the file contains extraneous spaces, you can strip them uncommenting
> that line.
>
1. Wouldn't it be a good idea to process the raw dictionary *once* and
cPickle the result?
2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -> ["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order. Note that if one of the sort keys is (-frequency), the
actual frequency doesn't need to be retained in the prepared
dictionary.
3. Anyone interested in the techniques & heuristics involved in this
type of exercise might like to look at input methods for languages like
Chinese -- instead of 26 letters mapped to 8 digits, you have tens of
thousands of characters of wildly varying frequency mapped to e.g. 400+
Pinyin "words" entered on a "standard" keyboard.
Cheers,
John
More information about the Python-list
mailing list