technique to enter text using a mobile phone keypad (T9 dictionary-based disambiguation)

Wed Aug 9 18:54:30 EDT 2006

bearophileHUGS at lycos.com wrote:
> I've tested that sorting just the strings instead of the tuples (and
> removing the stripping) reduces the running time enough:
>
>     def __init__(self):
>         numbers = '22233344455566677778889999'
>         conv = string.maketrans(string.lowercase, numbers)
>         lines =
> file("/usr/share/dict/words").read().lower().splitlines()
>         # lines = map(str.strip, lines)
>         lines.sort()
>         self.dict = [(word.translate(conv), word) for word in lines]
>
> If the words file is already sorted you can skip the sorting line.
> If the file contains extraneous spaces, you can strip them uncommenting
> that line.
>

1. Wouldn't it be a good idea to process the raw dictionary *once* and
cPickle the result?

2. All responses so far seem to have missed a major point in the
research paper quoted by the OP: each word has a *frequency* associated
with it. When there are multiple choices (e.g. "43" -> ["he", "if",
"id", ...]), the user is presented with the choices in descending
frequency order. Note that if one of the sort keys is (-frequency), the
actual frequency doesn't need to be retained in the prepared
dictionary.

3. Anyone interested in the techniques & heuristics involved in this
type of exercise might like to look at input methods for languages like
Chinese -- instead of 26 letters mapped to 8 digits, you have tens of
thousands of characters of wildly varying frequency mapped to e.g. 400+
Pinyin "words" entered on a "standard" keyboard.

Cheers,
John