syntax philosophy

Andrew Dalke adalke at mindspring.com
Tue Nov 18 02:03:41 EST 2003


Tuang:
> But I'm surprised at what you apparently have to go through to do
> something as common as counting the frequency of elements in a
> collection. For example, counting word frequency in a file in Perl
> means looping over all the words with the following line:
   ...
> This seems sort of unPythonesque to me, given the relative cleanliness
> and obviousness (after seeing it once) of other common Python
> constructs.
>
> But I guess I'm making assumptions about what Python's philosophy
> really is.

I see several replies already, but they don't seem to address your
question about the philosophical reasons for this choice.

A Python philosophy is that "Errors should never pass silently."
(To see some of the other points, 'import this' from the Python
prompt.)

When you reference '$histogram{$word}++' in Perl it automatically
creates the hash 'histogram' and creates an entry for $word with
the value of 0 (I think; it may set it to undef or "").  This is great,
as long as you don't make mistakes.

But people do make mistakes and misspell variables.  Had you
written '$histrogram{$word}' then Perl would have simply
created a new hash for you with that name.  This is enough of
a problem in Perl that it's recommended you 'use strict'
and declare the hash beforehand, as 'my %histogram'.

Python takes this approach by default, so there's no need for
the 'use strict' declaration.  Python also doesn't have the
sigil-based typing of Perl so you need to tell it what object
to create, hence the need for 'histogram={}'.

Similarly, dictionaries require that entries be created before they
can be used.  This is because it's impossible for Python to
know which value you want for the default.  Python is strongly
typed, so "2"+1 will raise an exception, unlike Perl where it
yields the number 3.  If Python used a 0 for the default then
what if you really wanted to concatenate strings?  If it used
"" then what if you wanted to add numbers?  Whatever choice
you make, it will be wrong for most cases.

It works in Perl because of Perl's weak type system -- or
permissive coercion system if you want to look at it that way --
and corresponding 'typed' operators, so that + and . coerce
to numbers or strings, respectively.

I've also found that the requirement that the key exists before
being used catches mistakes similar to the requirement that
variables exist before being used.

If you want, you can make a class which acts like a perl
hash, and assigns a default value or lets you redefine what
to use for that default.  Here's a start (similar in result but
different in approach to Peter Otten's example)

import UserDict

class PerlDict(UserDict.DictMixin):
  def __init__(self, default = 0):
    self.data = {}
    self.default = default
  def __getitem__(self, key):
    try:
      return self.data[key]
    except KeyError:
      self.data[key] = self.default
      return self.default
  def __setitem__(self, key, item):
    self.data[key] = item
  def __delitem__(self, key):
    try:
      del self.data[key]
    except KeyError:
      pass

But even with this you won't save much code.  Here's what
it looks like:

histogram = PerlDict()
for line in open(filename):
    for word in line.split():
      histogram[word] += 1

Compare that to the canonical Python implementation

histogram = {}
for line in open(filename):
    for word in line.split():
      histogram[word] = histogram.get(word, 0) + 1

As several people pointed out, for this example you should
consider using a histogram/counter class, which would
separate intent from the actual calculation, as in

histogram = Histogram()
for line in open(filename):
  for word in line.split():
    histogram.count(word)

Your reply is that you're looking for the philosophy behind
Python, using the histogram as an example.  That actually
is part of the philosophy -- in Python it's much easier to
make a class and instantiate an object with the appropriate
behaviours than it is in Perl, what with Perl's "bless" and
shift and @ISA.  The above 'Histogram' is simply

class Histogram:
  def __init__(self):
    self.histogram = {}
  def count(self, word):
    self.histogram[word] = self.histogram.get(word, 0) + 1

In Perl the equivalent would be something like (and only
roughly like -- I never did fully figure out how do to Perl
OO correctly)

package Histogram;
sub new {
  my ($class, $obj) = @_;
  bless $class, $obj;
  $obj -> {'histogram'} = {};
  return $obj;
}
sub count {
  my ($class, $obj, $word) = @_;
  $obj -> {'histogram'}{$word}++;
}

However, for a one-off histogram this level of abstraction isn't
worthwhile.

To summarize, Python's philosophical differences from Perl
for your example are:
  - variables must be declared before use (reduces errors)
  - dict entries must be declared before use (reduces errors)
  - dict entries cannot have a default value (strong typing)
  - classes are easy to create (letting you create objects which
        better fit your domain)

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list