More random python observations from a perl programmer

Sat Aug 21 21:25:25 EDT 1999

[posted & mailed]

[Tom Christiansen]
>:>    You can't use "in" on dicts.  Instead, you must use

[Neil Schemenauer]
>: In the Python philosophy, every expression should have an obvious
>: meaning.  "in" in this case does not.  Perl has a different
>: philosophy here.

[Tom Christiansen]
> To me it would be clear that it's a wasteful linear search of the
> values, just as in a list it's a wasteful linear search of the
> elements.

Python doesn't mind if you want to do something (theoretically) inefficient,
and e.g.

    x = [3, 5, 11]
    if 5 in x:
        print "yes"

does print yes (darned fast, too <wink>).  So the ambiguity of "thing in
dict" doesn't resolve itself on this basis.  Many people also view sequences
as mapping an index set to the sequence elements, and under that view the
current "thing in sequence" isn't asking whether thing is in the index set;
so "for consistency", then, "thing in dict" would have to refer to
membership in dict.values().  Finally, since you can already ask about key
membership directly via dict.has_key(thing), it would be un-Pythonic to
introduce a second spelling ("there's only one (most obvious) way to do it"
is a Python goal).

[on what's a local]
> I understand.  But it's well, surprising that it's all context-
> dependent.  Not fun to look for.

But there's nothing to look *for* <wink>.  That Python does not have
Scheme-like lexical closures, or dynamic scoping either, means that each
unqualifed name is either purely local or global; there are no other
possibilities to worry about, and globals used on the LHS of an assignment
have to be explicitly declared "global" *in* every function they're so used.
So global-vs-local is a purely local static property of a function's source
text, and no context outside of a function's body need be looked at.  It's a
non-issue in practice.

IMO Python's rules wouldn't work nearly as well in the presence of nested
lexical scoping, so I've got mixed feelings about introducing lexical
closures.  Most closures are "tiny & simple" enough that the
default-argument trick is semantically adequate albeit syntactically clumsy.

> ... but I've never understood why
>
>     a += b
>
> isn't immediately obvious as
>
>     a = a + b

That would be OK by me, but it would preclude some very natural uses for the
shorthand syntax.  Mentioned before e.g. that the NumPy people would love
for

    a += b

to update matrix a in-place ("a" and "b" may each consume many dozens of
megabytes, and in-place modification can buy order-of-magnitude runtime
efficiencies when mucking with huge objects).

Guido isn't opposed to += & friends, they simply hasn't been implemented
yet.

>: As I mentioned earlier, everything is a reference.  You just have
>: to know what is mutable and what is not.  Lists, obviously from
>: your example, are.

> Alas, I'm pretty sure that this isn't how people think unless
> they're trained to do so.
>
> There's a reason that in Perl most people still use
>
>     @a = (1,2,3);
>
> not
>
>     $a = [1,2,3];
>
> It's because @a is a first-class array, and $a merely a reference
> thereto.

The majority of Perl dabblers (as opposed to serious Perl programmers, but I
don't know many of those) I know avoid the latter form becase Perl's flavor
of references baffles them.

I don't think it's the "referenceness" per se, but the spelling that hangs
them up -- after the above, I'm not sure you can still sympathize with
someone asking "now what the heck is the difference between '$$a' and
'@$a'?!".  I agree it's perfectly logical, but *I* still hear it a lot <0.5
wink>.

The other thing I see a lot is

    @a = [5, 6, 7];

which is a very tempting mistake for people coming from other languages
(like, say, Python).  This doesn't create the array they expected, but a
1-element array containing a reference to the array they were expecting, and
e.g. now you have to get at the 2nd element via ${$a[0]}[1].  Last time I
tried that, "perl -w" let it slip by silently, too.  As you've noted
eloquently before, array-vs-scalar context is a "deadly sin" in Perl, and
I'll add that the introduction of references only made that aspect of the
language even touchier (now things that "look like" arrays can actually be
scalar).

> Copying things not producing a copy is well, not what non-serious
> programmers immediately think of.

Patience, here.  People will repeat this until it sinks in <wink>:
assignment never copies in Python.  It just binds a name to an object.  This
is something many (by no means all) people do struggle with at first, but I
think it's because it's *simpler* than what they're expecting, and it takes
some time before they stop trying to apply the overly convoluted mental
models they bring with them from other languages.  Persist, and the light
will eventually dawn!

> Why can python copy sequences of chars (1-byte strings) trivially:
>
>     s = "this stuff"
>     t = s
>
> But as soon as you have a sequence of, oh, integers for example, you
> can't use the same set-up.  Very odd.

Nope, that example and

    s = [1, 2, 3]
    t = s

are exactly the same.  In the first case, s and t end up naming the same
string ("s is t" returns true after that snippet); in the second case, s and
t end up naming the same array; and after

    s = anything_whatsoever
    t = s

s and t are both names for whatever anything_whatsoever computed.
Absolutely no exceptions.  The semantics of argument-passing work exactly
this way too: the formal argument names are simply bound to the actual
arguments passed by the caller, as if by a sequence of assignment
statements.  Ditto for
    from module import thing
In that case the local name "thing" is attached to the object module.thing.
Ditto for
    import module
There the local name "module" is attached to the module object.

Nothing is copied, no how, no way, never ever.  If you *want* a copy (of an
list, string, or anything else), you have to explictly make one.

Now the interesting question <wink> is why you *think* the string in your
example got copied!  Your mental model is conjuring up an inconsistency
where none actually exists in the language, so "the answer" to that one lies
not in Python but in what you're bringing to it.

Another interesting question is whether by-value or by-reference semantics
are more useful more often.  OO languages almost always pick the "by
reference" answer, and for good reasons both theoretical and practical.
Although sometimes it would be a lot more convenient if e.g. the "Tom
object" could get copied and go to 3 conferences simultaneously <wink>.

>:> ...
>:> GOTCHA: (medium)
>:>    All ranges are up to but *not including* that point.

The intended mental model is that indices point *between* elements, with 0
pointing "to the left" of the leftmost element, and -1 pointing to the left
of the rightmost element.  Let that sink in, and then everything else is
obvious.

A pair of endpoints could have 4 interpretations, depending on whether each
endpoint is inclusive or exclusive; that's why viewing slice notation as
naming the endpoints is a bad idea.  There are no surprises when viewing it
as intended.

Perl worms around a similar problem in "substr" and "splice" by using
offset+length coordinates, although a negative length (in at least "substr";
"split" too?) is special-cased to mean what it means in Python instead.
Unless you think of

    substr($s, 3, -1);

as meaning "all the characters starting at index 3 up to but not including
the character at position -1" <wink>.

Python's slice notation is Major Goodness, Tom!  It's also Major Goodness
that the same notation works on arrays, strings, tuples and user-defined
sequences.

>: This is explained quite well in the tutorial.  It makes a lot of
>: sense once you get used to it.

> As before, "once you get used to it" is the key point here.

Python isn't particularly concerned about being obvious at first glance
(*no* computer language could be deluded enough to claim that!); it's more
concerned with notations that, once mastered, are very hard to forget or
misuse by mistake.  Don't think "obvious at first", think "obvious forever
after".  And I agree the singleton tuple notation is a wart.

>: Again with the references. :)  Don't think references.

> You can't avoid it.

It *is* a lot like giving up a thirty-year-old alcohol addiction <wink>.  I
think "objects" in Python (Java too, for that matter), almost never
"references".  Objects have names, and that's about it; thinking about
pointers doesn't do much good for you in Python unless you're mucking with
its implementation.

> ...
> Huh?  How would I use a dictionary (don't you get tired of such a long
> and non-intuitive word?) to do this:
>
>     @a[3,7,4] = ("red", "blue", "green")
> or
>     @a[3,7,4] = @a[2,9,10];

Python doesn't have scatter/gather subscripting (note that most of the
arguments you're having about this here are due to "slice" meaning different
things in Python and Perl).  You can do the above in one line by using "map"
and operator.setitem, but a sane person would simply write a loop today, or,
if they did this a lot, capture that loop in a generalized function.  Python
may have gather/scatter subscripting in the future (the NumPy people are
keen for it; but iirc they're the only ones pushing it).

...
>: del is a statement.  It has to be so it can remove variable
>: bindings.

> I *really* don't undertsand this. Why does Python have so many
> "statements" compared with C or Perl?

To the small extent that it does (or don't you count "for" as distinct from
"foreach", or "while" from "until", or "if" from "unless"?), it's because
Python's lineage traces more directly to the stmt-oriented Algol than to the
expression-oriented C.

Why do Perl & Python both have so many statements compared to, say, Icon or
Lisp?  Why do they both have so few statements compared to, say, Fortran or
COBOL?  Is there a *real* reason Perl has the precise set of statements it
has, other than that it suited Larry Wall's taste at the time?

Python's "del" needs to be a statement because, e.g., as a function call

    del(a[3:7])

couldn't work, and having used the slice notation consistently everywhere
else, backing off to e.g. splice(a, 3, 4) wouldn't fit with the rest of the
language.

Python's "print" doesn't *need* to be a statement, but is one simply because
it's a common operation and Python requires "()" in function calls -- that
is, it's a statement purely for convenience.

Etc.

> ... I can ... without needing to sacrifice an entire *statement*
> in the grammar for this!

Are statements a scarce resource we need to preserve for future generations?
Python's parser is generated from a formal grammar specification; the latter
is in a file that's 90 lines long, a third of which is blank lines and
comments, and where precedence levels and associativity are fully spelled
out by the grammar structure (not by yaccish shorthands).  IOW, Python's
grammar is exceptionally small for a procedural language as-is (contrast
perly.y).

BTW, when I was writing Python's Emacs mode, Bill Mann worked down the hall
plugging away at Perl mode -- I hadn't before realized man's capacity for
suffering <wink>.

> ...
> There must be something important that I'm not understanding
> here, and I'd like to.

Expressions-vs-statements is a design rift that began no later than Fortran
vs LISP.  Neither Python nor Perl is at either extreme of that spectrum;
both are firmly in the squishy middle.  And only Lutherans suspect there's
something important differentiating them from Episcopalians <wink>.

once-weekly-vs-four-times-a-month-ly y'rs  - tim