declaration of variables?

Mon Feb 17 10:35:31 EST 2003

André Jonsson wrote:

> Jp Calderone wrote:
>>>That may be the case, but it's better/easier if the error/warning occurrs
>>>when the program i compiled (run-time or otherwise)
>> 
>>   No.  You think it is better or easier.  Many other people disagree.
> 
> Maybe so, but I know many also agree with me, so it's not really a
> clear-cut case.

Indeed, many people prefer other languages, designed under criteria very 
different from those applied in the design of Python.  But if you prefer
other languages, then maybe you should use those other languages, rather
than (giving the appearance of) coming to comp.lang.python to criticize
Python's design.  That may not have been your intention, but it was the
way your posts came across, at least to me and to some others.

>>   A good set of unit tests will catch this error any time it comes up. A
>> good set of unit tests will also catch many *other* errors that variable
>> declaration won't.  If you don't have good unit tests, it might be time
>> to consider writing some.
> 
> I haven't done many no, so that's probably the case. I still think that

Yes.

> receiving information about an error sooner is better than later. Having

Sure, *all other things being equal* any information about an error is
(by some tiny amount) more valuable the sooner it is received.

But designing a language so as to maximize the probability that errors
due to all typos can be reported ASAP carries many other costs, so the
costs and benefits need to be weighed against each other.  To quote a bit
of my interview on the www.python-uk.org site...:
"""
And finally, what is the most important lesson people can learn from/about 
Python? 
Great design is not about making no compromises, it's about making just the 
RIGHT compromises. That's why it's never easy. 
"""

Let me try to communicate the point more clearly.  Single-character
operators such as + - * / are clearly very typo-prone.  + and - are
SO close on many keyboards, and look SO similar, that they're really
a typo waiting to happen.  And if a and b are numbers, what language
will tell you that you just typed a-b MEANING to type a+b...?  Eeek.

So what would you think of a language that just DIDN'T allow such
easy-to-make typos?  It could take a leaf from Cobol's old syntax
and force you to type "ADD a TO b" or "SUBTRACT a FROM b", rather
than offering dangerous usage such as "a+b" and "a-b".

If you're truly convinced that early diagnosis of typos is such a
key design issue, then why aren't you using Cobol, or some language
that uses similarly redundant, typo-averse syntax?

The fact is -- there's a trade-off; your coding productivity is
impacted by having to type verbose boilerplate such as "ADD a TO b"
every time you mean "a+b".  And similarly for mandatory variable
declarations.  Python dispenses with such boilerplate and lets you
write down "executable pseudocode" -- you'll have to rely on unit
tests to ensure that you're adding or subtracting as appropriate,
i.e. to ward you against typos in operators, so, as you HAVE to have
unit tests anyway for this, why not use them against other typos too?

You may think I'm joking, but just a few years ago it took me DAYS to
find out (in an old piece of Fortran code at my previous employer)
one deuced bug that boiled down to an X+1 being where an X-1 should
have been -- the unit-test suite for that "tried and true" code was
definitely insufficient (to be honest: close to non-existent...).

> And I do think that declaring a variable isn't that much of a hassle,
> atleast not of the magnitude you (and others) make it out to be. Chasing
> down "strange" errors is.

That's why one codes UNIT tests -- for SMALL units of code -- rather
than relying on SYSTEM tests, which may catch the bugs but will make
nailing them down to a specific typo far from easy.

As for what is or isn't a hassle, well, clearly good old Cobol
programmers think that typing "SUBTRACT x FROM y" isn't a hassle,
and that it makes their code more readable and error-proof.  If
you agree, I suggest you try Cobol, a much-maligned language as
it happens (in modern versions, it can do more than you think).

Personally, I prefer Python -- that's why I'm here.

> Hmm, how about this example: if a given function sorts some list, if there
> is an error in there that doesn't cause a run-time error. How is it
> possible to detect this without having to write the whole routine once
> more and compare the results? (which, of course, is an error-prone
> process).

Writing a routine twice is just silly.  Rather, you should write down
the exact specs that your sorting is supposed to respect, and have a
check for them -- the routine that checks is the best way to express
the specs, of course.

I would suggest the following executable specs:

def test_sorter(sorter, somelist, tester):
    original = copy.copy(somelist)
    sorter(somelist)
    tester.assertEqual(len(somelist), len(original))
    previous_item = somelist[0]
    for item in somelist:
        tester.assert(item <= previous_item)
        tester.assertEqual(somelist.count(item), original.count(item))
        previous_item = item

The assertion of equal lengths is redundant (since we're asserting
equal count for each item anyway) but it can catch an obvious error
rapidly.

This test_sorter implementation is O(N ** 2) [because of the calls
to the count methods of lists] -- but, unit tests are supposed to
run on SMALL test cases, so that may not be a problem.  It's not
too hard to make an O(N) tester, if the lists' items can be
guaranteed to be hashable, of course - Python's dictionaries make
that pretty easy:

def test_sorter_1(sorter, somelist, tester):
    original = copy.copy(somelist)
    sorter(somelist)
    def countsOf(listofhashables):
        result = {}
        for item in listofhashables:
            result[item] = 1 + result.get(item, 0)
    tester.assertEqual(countsOf(somelist), countsOf(original))
    previous_item = somelist[0]
    for item in somelist:
        tester.assert(item <= previous_item)
        previous_item = item

However, for typical unit-testing needs, I would prefer the first
version.  It's simpler, and runs well on SMALL test lists, which
are what you should generally be using anyway, as well as not having
a constraint about hashability of items.

Of course, sorting is somewhat of a moot case, since the sort
method of Python's lists is SO deucedly good you wouldn't normally
code (and thus have to test) other sort routines.  But the general
approach is sensible -- consider tests as, first of all, a kind
of *executable specification*.  If a complete specification takes
too long to execute (unit tests must be run OFTEN to be truly
valuable...), split your tests into two -- a complete set that
takes however long it needs to, and a subset that runs fast though
it may not be testing 100% of the specs.

Alex