Suggest more finesse, please. I/O and sequences.
Scott David Daniels
Scott.Daniels at Acm.Org
Fri Mar 25 18:30:23 EST 2005
Qertoip wrote:
> Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisaĆ(a):
> > ...
>> for word in line.split():
>> try:
>> corpus[word] += 1
>> except KeyError:
>> corpus[word] = 1
>
> Above is (probably) not efficient when exception is thrown, that is most of
> the time (for any new word). However, I've just read about the following:
> corpus[word] = corpus.setdefault( word, 0 ) + 1
That is better for things like:
corpus.setdefault(word, []).append(...)
You might prefer:
corpus[word] = corpus.get(word, 0) + 1
The trade-off depends on the size of your test material. You need
to time it with your mix of words. I was thinking of cranking
through a huge body of text (so words of frequency 1 are by far
the minority case). If you run through Shakespeare's first folio,
and just do the counting part, the try-except and .get cases are
indistinguishable (2.0 sec for each), and the .setdefault version
drags in at a slow 2.2 sec. Just going through Anna Karenina,
again .83, .83 and .91. So the .setdefault form is 10% slower.
For great test cases, (and for your own personal edification)
visit Project Gutenberg.
Beware when you do timing: whether the file is "warm" or not can
make a huge difference. Read through it once before timing either.
--Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list