programming languages (etc) "web popularity" fun

Alex Martelli aleax at aleax.it
Fri Oct 31 11:16:05 EST 2003


Cameron Laird wrote:
   ...
> It's easy to imagine sources of noise for these data, including
> such English-language commonplaces as "go forth" you already

Sure!  Although juxtaposing "programming" to the search, as my
little script did, is, I believe, going to help a lot, it's no
magic.  If a language was called, for example, 'and', we'd NEVER
manage to get reliable statistics about it:-).

Actually there is a lesson here about "product naming for the
21st century".  If you want to help people googling for your
product (firm, project, whatever), *use a made-up word* so that
all the google hits on it will be real ones.  If you want to make
sure you're basically ungooglable-for, well -- take a leaf from
MS, and name your technologies "COM", ".NET" and so on:-).

> mentioned.  A next step might be to try to refine the queries
> to eliminate classes of noise.  The one that most catches my at-
> tention is PHP; I've got to think that a lot of those are pages
> that use PHP, rather than discuss it.

No doubt, and google can help a little with THIS kind of artefact,
thanks to the "allintext:" qualifier.  (BTW, should anybody with
any interest in web searching not have O'Reilly's book "Google
Hacks" yet, GET IT!-).

So, I've made a 2nd release of my script, more targeted at those
languages which stand a chance for the top spots and more subject
to automatic cleaning.  The quoter function has gone, the langs
variable is built in more detail with:

langs = [x.strip() for x in '''
    "c" -"c++" -"c#"
    basic -visual
    "c++"
    "visual basic"
    "assembly language" OR "machine code" OR "machine language"
    forth -"go forth" -"and so forth"
    "c#"
    pascal -object
    [ ...many simple unquoted single-word languages snipped... ]
    smalltalk
    ruby
    '''.splitlines() if x.strip()]

and the search, in the loop, has become:

    data = google.doGoogleSearch('allintext: %s programming' % lang)


with these refinements, we get the following top 30 languages:

            Language # of hits

                java   3050000
      c" -"c++" -"c#   2470000
       basic -visual   1880000
                 c++   1710000
                perl   1510000
                 php   1060000
          javascript    939000
        visual basic    758000
              python    682000
              scheme    642000
                  c#    460000
forth -"go forth" -"and so forth    325000
             fortran    322000
              delphi    305000
                 tcl    254000
          postscript    236000
                 abc    233000
                lisp    201000
                 ada    177000
                  ml    174000
            vbscript    165000
               cobol    157000
assembly language" OR "machine code" OR "machine language    146000
      pascal -object    142000
              foxpro    127000
                 vba    112000
              matlab    103000
           smalltalk     90300
                ruby     88000

php has indeed lost a couple notches, and so have forth, assembly
(most particularly), pascal, basic.  The "top 10" are still the same
though.  A few hints for would-be further-cleaner-uppers though...:

abc programming gets a LOT of help from one certain TV network!-)
    [all others on this list, from a simple eyeball test w/interactive
     searches on 1st pages only, appear legit]
c is HEAVILY handicapped by those - conditions; if we did
    java -"c++" -"c#" (tried interactively), we'd only get 2,370,000,
    so c is in fact still quite likely to be king of the heap (same
    query, interactive, with C instead of java, is over 4,000,000...)
this is an example of the fact that these numbers don't get reproduced 
    when I try the same google queries interactively (in opera) -- there
    may be different filtering schemes in play
being careful is of course particularly warranted when two contendants
    appear to be very close, abd there are many such pairs here --
    python and scheme, forth and fortran, ada and ml, smalltalk and ruby...

Let's see what somebody else can dream up, perhaps on a very different
tack than my idea of tacking the word 'programming' on...


Alex





More information about the Python-list mailing list