[Tutor] Top posters for 2009

Fri Feb 26 03:53:24 CET 2010

It's not really about keeping score :-), but once again I've compiled
a list of the top 20 posters to the tutor list for the last year. For
2009, the rankings are

2009 (7730 posts, 709 posters)
====
Alan Gauld 969 (12.5%)
Kent Johnson 804 (10.4%)
Dave Angel 254 (3.3%)
spir 254 (3.3%)
Wayne Watson 222 (2.9%)
bob gailer 191 (2.5%)
Lie Ryan 186 (2.4%)
David 127 (1.6%)
Emile van Sebille 115 (1.5%)
Wayne 112 (1.4%)
Sander Sweers 111 (1.4%)
Serdar Tumgoren 100 (1.3%)
Luke Paireepinart 99 (1.3%)
wesley chun 99 (1.3%)
W W 74 (1.0%)
Marc Tompkins 72 (0.9%)
A.T.Hofkamp 71 (0.9%)
Robert Berman 68 (0.9%)
vince spicer 63 (0.8%)
Emad Nawfal 62 (0.8%)

Alan, congratulations, you pulled ahead of me for the first time in
years! You posted more than in 2008, I posted less. Overall posts are
up from last year, which was the slowest year since I started
measuring (2003).

Thank you to everyone who asks and answers questions here!

The rankings are compiled by scraping the monthly author pages from
the tutor archives, using Beautiful Soup to extract author names. I
consolidate counts for different capitalizations of the same name but
not for different spellings. The script is below.

Kent

''' Counts all posts to Python-tutor by author'''
# -*- coding: latin-1 -*-
from datetime import date, timedelta
import operator, urllib2
from BeautifulSoup import BeautifulSoup

today = date.today()

for year in range(2009, 2010):
    startDate = date(year, 1, 1)
    endDate = date(year, 12, 31)
    thirtyOne = timedelta(days=31)
    counts = {}

    # Collect all the counts for a year by scraping the monthly author
archive pages
    while startDate < endDate and startDate < today:
        dateString = startDate.strftime('%Y-%B')

        url = 'http://mail.python.org/pipermail/tutor/%s/author.html'
% dateString
        data = urllib2.urlopen(url).read()
        soup = BeautifulSoup(data)

        li = soup.findAll('li')[2:-2]

        for l in li:
            name = l.i.string.strip()
            counts[name] = counts.get(name, 0) + 1

        startDate += thirtyOne

    # Consolidate names that vary by case under the most popular spelling
    nameMap = dict() # Map lower-case name to most popular name

    # Use counts.items() so we can delete from the dict.
    for name, count in sorted(counts.items(),
key=operator.itemgetter(1), reverse=True):
       lower = name.lower()
       if lower in nameMap:
          # Add counts for a name we have seen already and remove the duplicate
          counts[nameMap[lower]] += count
          del counts[name]
       else:
          nameMap[lower] = name

    totalPosts = sum(counts.itervalues())
    posters = len(counts)

    print
    print '%s (%s posts, %s posters)' % (year, totalPosts, posters)
    print '===='
    for name, count in sorted(counts.iteritems(),
key=operator.itemgetter(1), reverse=True)[:20]:
        pct = round(100.0*count/totalPosts, 1)
        print '%s %s (%s%%)' % (name.encode('utf-8',
'xmlcharrefreplace'), count, pct)
    print