[Python-3000] Python-3000 Digest, Vol 9, Issue 27

Tue Nov 14 22:38:54 CET 2006

On 11/14/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Having a rich method API vs having a narrow method API and duck-typed support
> functions is a design trade-off. In the case of sequences and mappings, the
> trade-off went towards a richer API because the de facto reference
> implementations were the builtin dict and list classes (which is why DictMixin
> and ListMixin are so useful when implementing your own containers). In the
> case of iterables and iterators, the trade-off went towards the narrow API so
> that the interface could be used in a wide variety of situations (lines in a
> file, records in a database, characters in a string, bytes from a serial port,
> frames in a bowling game, active players in a MMORPG, etc, etc, etc).

To augment Nick's excellent explanation, note that you can't restart
an iterator or go back ("unyield" a value).  There was a conscious
decision to limit the iterator interface to make it as easy as
possible for some unknown future object to implement an iterator.
Now, anything with a compliant .next() method is an iterator.  Nothing
prevents a particular iterator from having reset or backstepping
methods, but these are not required for all iterators.  I sense this
is the same reason people are opposed to your proposal.  I'll not
comment on the merits of having iter() return a rich Iterator object
because I don't understand well enough what we might lose.  The real
problem is that iterator is an interface, and there's no formal way to
express interfaces in Python; it's all in the documentation.  That's
why the relationship between dict and "mapping type" seems so
nebulous, and also why an iterator looks like the same kind of object
as a dict or list but it isn't.

The tradeoff between rich objects + methods and minimal objects +
functions is pervasive in Python.  len(x) is a function because it
applies to a wide variety of objects -- not all of them known yet --
rather than to a certain class hierarchy.  One major success of the
iterator protocol was when file objects became iterable.  No more
"while loop with break in the middle": now you can just do a simple
"for line in file".  If the file iterator were required to support "+"
and your other rich methods, it may be difficult to implement.  What
if the file is tied to a socket-like stream instead of a disk file?

When you need a mapping, the obvious answer is, "Use a dict!"  A dict
is an empty container with infrastructure for adding and managing
items.  When a file object needs to iterate likes in a file, it can't
"use an iterator", it has to *be* an iterator.    A generic iterator
object doesn't know *how* to iterate over a file; that's what your
class has to do.  But in order to do that it has to implement the
iterator interface... and we're right back where we started.

A "listtools" package is not a bad idea, actually.  A place to gather
list-like objects and their functions so they aren't scattered
throughout the library.  "collections" almost does it, although it's
defined a bit too narrowly for function libraries (but that could be
changed).  But this is a minor issue.

The thing with "mappingtools" is, what would such a package contain?
I've sometimes wondered why there's only one mapping type after so
many years.  But the real question is, what kind of "mapping type"
functionality is there that a dict doesn't provide?  None that I can
think of.  "ordered dict" and "default dict" can be handled by
subclasses.

The problem with .__getitem__ is, you can't tell whether an object is
a sequence or a mapping.  If it has .__getitem__, it's one or the
other, but you don't know whether it accepts sequential integers
starting from 0, or arbitrary keys.  This is not really .__getitem__'s
fault since there's only one [] operator to access both.  It's the
lack of interfaces again: the lack of a universal way to say "this is
a sequence type".  All the routine can do is document what it expects
in arguments, and hope that the caller heeds it.

As for itertools, I rarely use it.  If so it's one or two functions at
a time.  And often I'm not actually using the functions, just studying
the implementation so I can write a custom function that does what I
need.  I suppose it would be nice to chain iterators with "+", but I
chain iterators so rarely it's no big deal.  I suspect many other
programmers are the same way.  What I'd most like to see in itertools
are the functions on the "Recipes" page.  Why should everyone have to
paste the code for no/quantify/flatten rather than having one central
function for them?  But I expect this will happen one one by one as
certain recipes get popular.

-- 
Mike Orr <sluggoster at gmail.com>