referencing a subhash for generalized ngram counting
Scott David Daniels
Scott.Daniels at Acm.Org
Tue Nov 13 22:25:06 EST 2007
braver wrote:
> ...
> The real-life motivation for this is n-gram counting. Say you want to
> maintain a hash for bigrams. For each two subsequent words a, b in a
> text, you do
> bigram_count[a][b] += 1
This application is easily handed with tuples as keys.
bigrams = {}
src = iter(source)
lag = src.next()
for current in src:
bigrams[lag, current] = bigrams.get((lag, current), 0) + 1
lag = current
But if you really want nested:
bigrams = {}
src = iter(source)
lag = src.next()
for current in src:
count = bigrams.setdefault(lag, {}).get(current, 0)
bigrams[lag][current] = count + 1
lag = current
-Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list