programming languages (etc) "web popularity" fun

Alex Martelli aleax at aleax.it
Fri Oct 31 06:30:47 EST 2003


(You need Mark Pilgrim's pygoogle, see
http://diveintomark.org/projects/pygoogle/ , and a personal license to the
google api, see http://www.google.com/apis/ , saved in a file such as
"googlekey.txt" in your home directory [pygoogle looks in several places,
see http://diveintomark.org/projects/pygoogle/readme.txt for the list).

So, a little script such as...:

#! /usr/local/bin/python2.3
# programming languages popularity web-survey

import google
import time

def quoter(xs): return ['"%s"'%x for x in xs]
langs = '''
    python ruby perl caml java haskell lisp eiffel sml scheme
    fortran ada forth apl javascript ecmascript vbscript vba sql
    bash awk tcsh csh zsh ksh autolisp elisp occam intercal basic
    abc algol applescript assembly befunge beta chill cobol dylan
    erlang pascal delphi idl limbo smalltalk squeak m4 matlab logo
    foxpro turing tcl snobol simula setl self rexx rebol postscript
    php oz modula ml miranda mercury mumps oberon sather stackless
    functional procedural parallel hpf agile extreme database
    relational rpg
    '''.split() + quoter([
      'visual basic', 'object pascal', 'objective c', 'c++', 'c#', 'c',
      'stackless python', 'object oriented',
    ])

# ensure all duplications are removed
langs = dict.fromkeys(langs).keys()

print 'examining %d terms' % len(langs)
results = []
for i, lang in enumerate(langs):
    # ...compensate for frequent "internal errors" from google...
    while True:
        print '%2d: %20s' % (i, lang.strip('"'), ),
        try: data = google.doGoogleSearch(lang + ' programming')
        except Exception:
            print "... likely internal server error, we wait & retry... "
            time.sleep(0.5)
        else:
            results.append((data.meta.estimatedTotalResultsCount, lang))
            # give running feedback since it DOES take a while!
            print '%9d' % data.meta.estimatedTotalResultsCount
            break
results.sort()
results.reverse()
print
print
print '%20s %9s' % ("Language", "# of hits")
print

for numb, lang in results:
    print '%20s %9d' % (lang.strip('"'), numb)


Gives me the following results:

            Language # of hits

                   c   4980000
            database   3750000
               basic   3750000
                java   3320000
                self   2000000
                 php   1880000
                 c++   1860000
                perl   1640000
                 sql   1150000
                logo   1070000
            parallel   1030000
          javascript   1030000
          functional    997000
     object oriented    944000
        visual basic    847000
                beta    745000
              python    729000
              scheme    693000
            assembly    687000
               forth    591000
             extreme    572000
                  c#    506000
          relational    377000
              delphi    354000
             fortran    344000
              pascal    329000
          postscript    297000
                 tcl    277000
                 abc    259000
                lisp    220000
          procedural    204000
                  ml    201000
                 ada    196000
            vbscript    181000
               cobol    171000
              foxpro    137000
                 vba    123000
              matlab    111000
           smalltalk    101000
                ruby     97900
                bash     87400
             mercury     86800
                 rpg     81600
                  oz     78500
              turing     72200
                rexx     66100
               agile     62700
              eiffel     58300
                 idl     58100
             haskell     55100
                 awk     53100
               mumps     49800
               chill     47600
         objective c     44900
              modula     39000
                 apl     38800
                 csh     31700
               dylan     31500
              simula     30600
              erlang     29900
                  m4     28000
              squeak     24400
             miranda     24300
         applescript     24000
       object pascal     23900
               algol     21000
                 ksh     17900
                tcsh     17600
                 sml     16000
              oberon     15400
                caml     15300
                 hpf     11900
               limbo     11400
               rebol     10800
               occam     10300
               elisp      8780
          ecmascript      7080
                 zsh      5640
            autolisp      5430
              sather      4260
              snobol      3900
            intercal      2700
                setl      2010
           stackless      1040
             befunge       951
    stackless python       431

of course there are quite a few anomalies here -- e.g. i think there is
no automatic way to "clean" the C hit count from the hits for objective c,
c++, c# -- basic from visual basic -- and so on.  But then, this is for
fun, not a scientific query, which is why i've mixed other catchwords
with the programming languages as I thought of them.

Doing some "eyeball cleanup" we can see that c, net of c++, c# etc, must
be a little below Java; basic, net of visual basic, ditto.  'self' is
alas too unlikely to refer to that little-known though interesting
language:-).  similarly for 'logo', 'beta', ... -- and 'sql' is likely 
to be mixed up with many other languages too.

So, I think the top ten places, in order, for actual languages, are really:
        java
           c (not objective/c++/c#)
       basic (not visual)
         php
         c++
        perl
  javascript
visual basic
      python
      scheme

not too surprising, I guess.  One could explore a bit more of course
(e.g. specifically look for 'basic -visual' etc etc) but I'm running
a bit short of my daily 1000 searches so I'm gonna leave that fun to
you, o readers.  Points to ponder: the preponderance of visual basic
over python, and of python over scheme, is really small; the latter
may perhaps be explained by some occurrences of 'scheme' as an ordinary
word rather than the language name, and the former by the fact that the
typical web usage of many visual basic programmers is unlikely to include
writing websites about VB, compared to the web usage of Pythonistas.

If scheme's apparent popularity does turn out to be an artefact, then
forth (or is it an artefact from "go forth" etc...?-), assembly (but IS
that used in the programming sense...?), and C# are the other possible
contenders for the coveted tenth place.  After the contenders for the
top places we have a (to me!) somewhat surprising bunch -- delphi,
fortran, pascal, postscript (!), tcl, abc (!?), lisp, ml, ada, and
vbscript in this order.  Wow -- how are the mighty fallen! -- cobol
is BELOW this second bunch...!

Coming to buzzwords that aren't programming languages, other
surprises await: "functional" edges out "object oriented", "extreme"
is WAY more popular than "procedural" (yeah right:-), "agile"
programming isn't as popular a term as I'd have thought (but still,
more than eiffel...:-).

Plenty of other food for flamewars here -- can mercury AND oz
really be THAT much more popular than haskell, erlang, caml -- the
latter badly outscored even by OLD miranda -- and ML so WAY more
popular than ALL other pure functional languages & dialects (and
indeed even more than ada, vbscript, cobol, foxpro, vba, matlab,
smalltalk, ruby, bash...)...?!

googling sure _IS_ plenty of fun!!!-)


Alex





More information about the Python-list mailing list