some random reflections of a "Python newbie": (2) language issues

Tim Peters tim_one at email.msn.com
Mon Dec 13 05:05:27 EST 1999


[alex at magenta.com]
> ...
> The only problem with not looking at __getitem__'s "key"
> parameter seems to be that there is no way to "restart
> from the beginning" -- but I guess one could easily
> special-case-interpret a key of 0 to mean that.  E.g.:
>
> class fib:
>     "Enumerate the rabbits of a guy from Pisa"
>     def __init__(self):
>         (self.old, self.lat) = (0, 1)
>     def __getitem__(self, key):
>         if key==0:
>             self.__init__()
>             return self.old
>         new = self.old+self.lat
>         (self.old, self.lat) = (self.lat, new)
>         return self.old
>
> Now,
> >>> rabbits=fib.fib()
> >>> for i in rabbits:
> ... 	if i>100: break
> ... 	print i
> does work as I thought it should.
>
> [... and more about enumerators ...]

The for/__getitem__ protocol was really designed for sequences, and it's a
strain to push it beyond that.  This kind of stuff is cool, but after a few
years you may tire of it <0.7 wink>.

Here's a __getitem__ I've got sitting around in a Set class, that represents
a set of values via a dict self.d:

    def __getitem__(self, i):
        if i == 0:
            self.keys = self.d.keys()
        return self.keys[i]

Even that's a bit of a strain, but it does nest correctly -- albeit by
accident <wink>.

> ...
> However, there is one little problem remaining... being
> an enumerator doesn't let the object support nested loops
> on itself, which a "real" sequence has no problems with.

Exactly.

> If the object "can give out an enumerator" to itself,
> rather than keeping state itself for the enumeration,
> it would be more general/cleaner.

(At least) The same options are available in Python as in C++.  The relative
burden of method overheads being what they are, though, the aforementioned
Set class also has a method that's much more heavily used than
Set.__getitem__:

    # return set members, as a list
    def tolist(self):
        return self.d.keys()

That is, for/in works much faster on a native sequence type, so I generally
have a "tolist" or "totuple" method and do "for thing in
collection.tolist():".  That doesn't work at all for unbounded collections
(like your rabbits), but is fast and obvious for most collections.

> I guess I can live with that through an "enumerator" func,
> somewhat like (if I define __enumerator__ to be the
> "give-out-an-enumerator" method):

Note that double-double-underscore names are technically reserved for the
implementation (i.e., Guido may stomp on any such name in a future release).

> def enumerate(obj):
>     try:
>         return obj.__enumerator__()
>     except:
>         try:
>             return obj[:]
>         except:
>             return obj
>
> (not ideal, sure -- I actually want to give out the obj
> if it's an immutable sequence [can I rely on its having
> __hash__ for that...?],

Classes only have __hash__ if the user defines it.  Contrarily, some mutable
objects do define __hash__; e.g., back to that surprisingly educational
<wink> Set class:

    def __hash__(self):
        if self.frozen:
            hashcode = self.hashcode
        else:
            # The hash code must not depend on the order of the
            # keys.
            self.frozen = 1
            hashcode = 0
            _hash = hash
            for x in self.d.keys():
                hashcode = hashcode ^ _hash(x)
            self.hashcode = hashcode
        return hashcode

This makes it possible to have Sets of Sets (& so on), although putting a
set S in a set T freezes S.  The mutating methods of Set complain if
self.frozen is true.  Most times I lazier than that, though, and pass out
hashes without any protection.

The point is that people can & do abuse everything in the language in all
conceivable ways -- and in quite a few that aren't conceivable.  There's
almost nothing you can count on without exception.

> slice it if it's a mutable one, etc, but, basically...), so I can
> have a loop such as "for x in enumerate(whatever):" which _is_
> polymorphic _and_ nestable.
>
> _WAY_ cool...!!!
>
> Btw, a request for stylistic advice -- am I overusing
> try/except in the above sketch for "enumerate"?  Python
> seems to make it so easy and elegant to apply the good
> old "easier to ask for forgiveness than permission" idea,
> that my C++-programmer-instincts of only using exceptions
> in truly exceptional cases are eroding fast -- yet I see
> that recommendation in Guido's own book (p. 192 -- soon
> belied by the example on p. 194 that shows exceptions
> used to implement an anything-but-exceptional typeswitch,
> but...).  Maybe in a smoothly dynamic language such as
> Python keeping "exception purity" is not such a high
> priority as in C++ or Java...?

You'll note that the docs rarely mention which exceptions may be raised, or
when or why; I'm still unsure whether that's Good or Bad!  In the case of
checking an object for __enumerator__, that's what hasattr was designed for,
so most people would instead write

    if hasattr(obj, "__enumerator__"):
        ...

But there's no way to test for whether obj[:] is possible without trying it,
so try/except is your only choice there.  The Types-SIG archive (from about
a year ago) has many delightful words about all this <wink>.

> How would/should I implement "enumerate" without so much
> reliance on try/except, i.e. by explicitly testing "does
> this object have an __enumerator__ method that can be
> called without arguments, or, if not, can I slice it to get
> a copy, or, ...", etc...?  In C++ I guess I'd do a
> dynamic_cast<> for this kind of thing (assuming inheritance
> from suitable abstract classes) -- what's the "best" Python
> idioms, and what are the trade-offs here...?

A dynamic language presents a great temptation to hyper-generalization.
Resist it!  Not for your sake so much as for ours -- we'll never be able to
understand your code.

If you want to define a new all-encompassing protocol (like __enumerate__),
chances are excellent you're the only person in the world who will use it --
much as if 8 groups within a department are using C++, they'll maintain at
least 16 incompatible array classes <0.5 wink>.

If it's something you can't live without, then-- as its likely sole
user --the tradeoffs are entirely up to your judgment.  Indeed, *I'm* the
world's only user of my Set class:  everyone else just manipulates a dict
explicitly!  "Programming infrastructure" classes & frameworks have a much
bigger audience in C++/Java, and for good reasons:  it takes many more lines
of code to get something done in the latter.  An enumerator in Python
usually takes no more than 4 lines of dirt-simple code, so people
instinctively realize it would take longer to read & understand the
__enumerator__ docs than to roll their own one-shots 100 times over.

That's why Python is so productive:  nobody spends any time reading docs
<wink>.

>> [1.6 is supposed to add a __contains__ method]

> Oh my -- I even guessed the *name* of the special method
> needed...?!-)  That cuts it, I guess -- Python and I were
> just _made_ for each other!-).

Python was made for everyone equally, Alex -- although I am pleased to say
you do seem to be *especially* equal <wink>.

python-farm-ly y'rs  - tim






More information about the Python-list mailing list