December 2002 comp.lang.* stats

Erik Max Francis max at
Sun Jan 26 02:47:14 CET 2003

Peter Hansen wrote:

> Spam is probably a problem best ignored.  It would probably
> affect all those groups equally anyway.

Actually, that's one of the problems with his collapsing hierarchies
into a single number.  To first order, spammers would probably post to
every comp.* group with the same frequency.  So if a hierarchy contains
six groups, the raw numbers will likely be overcounting spam by
approximately a factor of six, as compared to a solitary newsgroup.

To second order, there's probably an additional effect of newsgroups
with names that sort lexicographically early getting more spam, since
more spammers do their spams sequentially, and those that get forcibly
stopped will be less likely to hit comp.lang.z than comp.lang.a.

