Using the nntplib module to count Google Groups users
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sat Oct 26 23:32:25 EDT 2013
There's been a bit of a discussion about how prevalent Google Groups
users are in this forum. This is a good opportunity to use one of
Python's standard library modules to scan through the comp.lang.python
newsgroup and find out. So here's some code to do so:
import nntplib
import sys
s = nntplib.NNTP('news.internode.on.net') # footnote [1]
resp, count, first, last, name = s.group('comp.lang.python')
print 'Group', name, 'has', count, 'articles, range', first, 'to', last
print 'Checking the most recent (approx) 5000 messages...'
last = int(last)
count = 0
gg = 0
template = "\rArticle %d: found %d Google Groups headers."
try:
for id in range(last-5000, last+1):
try:
headers = s.head(str(id))
except Exception:
continue
count += 1
for line in headers:
if "google" in line and "group" in line:
gg += 1
sys.stdout.write(template % (id, gg))
sys.stdout.flush()
break
except KeyboardInterrupt:
pass
finally:
print
s.quit()
print "Google Groups posts: %.2f%% of %d" % (gg*100.0/count, count)
Footnote [1] For this to work, you will need to be a subscriber with the
ISP Internode. If you are not, you will need to substitute your ISP's
news server. (Or your own, if you are running your own news server.)
This is a relatively busy newsgroup, and consequently downloading all the
headers may take a while, which is why I have limited it to only the most
recent 5000. I get this output:
Group comp.lang.python has 150071 articles, range 369087 to 519157
Checking the most recent (approx) 5000 messages...
Article 519153: found 957 Google Groups headers.
'205 Transferred 12653216 bytes in 0 articles, 0 groups. Disconnecting.'
Google Groups posts: 19.14% of 5001
Note that this *definitely* over-counts Google Groups. It also includes
replies to GG posts, as well as those actually sent via GG. There are
other false-positives as well. But as a rough-and-ready estimate, I think
it is good evidence that fewer than 1 in 5 posts come from Google Groups,
so definitely a minority, and by a long way.
Naturally this doesn't count lurkers who read via GG but never post. Nor
does it count distinct users, only distinct posts.
If anyone wants to modify the script to determine the ratio of posters,
rather than posts, using GG, be my guest. I'd be interested in the
answer, but not interested enough to actually do the work myself.
--
Steven
More information about the Python-list
mailing list