December 2002 comp.lang.* stats

John Roth johnroth at ameritech.net
Sun Jan 26 08:15:22 EST 2003


"Peter Hansen" <peter at engcorp.com> wrote in message
news:3E335278.8F2DD2E8 at engcorp.com...
> Erik Max Francis wrote:
> >
> > Peter Hansen wrote:
> >
> > > Spam is probably a problem best ignored.  It would probably
> > > affect all those groups equally anyway.
> >
> > Actually, that's one of the problems with his collapsing hierarchies
> > into a single number.  To first order, spammers would probably post
to
> > every comp.* group with the same frequency.  So if a hierarchy
contains
> > six groups, the raw numbers will likely be overcounting spam by
> > approximately a factor of six, as compared to a solitary newsgroup.
>
> I would think that removing unique posters would eliminate a lot
> of this effect, as the same poster would be sending to each newsgroup.
> Yes, many use random addresses... but don't they still send in bulk?
>
> > To second order, there's probably an additional effect of newsgroups
> > with names that sort lexicographically early getting more spam,
since
> > more spammers do their spams sequentially, and those that get
forcibly
> > stopped will be less likely to hit comp.lang.z than comp.lang.a.
>
> I strongly doubt anyone gets stopped fast enough to prevent their
> spamming one comp.lang group shortly after they've done another one.
>
> In the end, my comment should really be taken as "spam is a small
> enough issue, in my experience, to be ignored in the results as
> mere noise".  I readily admit my experience is limited to c.l.p
> and several other groups *not* in the c.l. hierarchy, so maybe
> some of those other groups get *much* more spam than c.l.p, but
> I sort of doubt it.  Maybe someone will take the time to calculate
> actual numbers to prove or disprove this point.  I wouldn't bother
> though.
>
> -Peter

I've been amused by this subthread, since I've almost never seen
spam in any of the comp.* groups I frequent. Maybe this has to
do with my using a paid service that does an excellent job of
de-spamming their newsfeed.

If someone wants to run the script against, say, Supernews,
I doubt if the numbers would be significantly different. But maybe
they would be.

John Roth






More information about the Python-list mailing list