[Python-Dev] collections module

Fri Jan 9 08:25:11 EST 2004

From: Gareth McCaughan
> I think we should be more concerned about how useful a new
> feature is than about what "message" it gives about existing
> features. Anyway, the message that will actually be given
> is that for some purposes lists and dictionaries are not
> the best data structures. That's obviously true, so why
> try to avoid saying it?

You're right. But on reading your response, I think that I didn't
manage to express my concern very well (I shouldn't compose emails
while getting continually interrupted).

What I was trying to say was that there are benefits to having an
implementation in Python [1] (if that is possible). The key benefit
of a C implementation seems to me to be that it is faster. That's
fine, but it's important to have a balanced view - performance
isn't overwhelmingly more important than other qualities.

[1] For example:

      - Maintainability
      - Usefulness as an example of good coding practices (on
        the assumption that we're after good *Python* practices,
        rather than good C ones...)
      - Accessibility for reference/documentation
      - Reusability (maybe less so if the C version is designed
        to allow subclassing)

>> That is true, but equally the array module has a feeling of being
>> "specialised". I'm not sure I can quantify this, but your
>> description of the collection module doesn't feel similarly
>> "specialised".
>
> Being less "specialized" is a matter of being more broadly
> useful, no? That seems like a good thing to me.

I expressed that badly - if I had 100 numbers to store, I'd use a
list and never even think of the array module. In other words, the
array module feels "specialised" in the sense that the tasks it is
good (ie, optimal - better than a generic list - whatever) at are
more restricted than "just" storing homogeneous data. The
implication is that it sacrifices generality for other useful
features (not just performance, conversion to a binary format also
comes to mind), *and the trade-off is worth it*.

My characterisation of "less specialised" was meant in the same sense
as is implied by the phrase "jack of all trades, master of none". On
reflection, I'm not sure it applies here. So I'll concede that point.

> If Python's existing library has warped users' mental models
> so that they think of a queue primarily as "something you use
> with threads" rather than as "a data structure where you can
> add things to the front and pull them off the rear efficiently"
> then that's a *bad* thing about Python's existing library, and
> decoupling the two concepts by adding a fast queue implementation
> will be a win.

Good point. On that basis, I'd support using the name queue in the
collections module, and maybe even going so far as to suggest that
Queue.Queue be renamed (with the old name retained for compatibility)
to something like Threading.FIFO (or maybe Threading.channel, like
Occam/CSP, but that'd clash with Stackless...).

>> Apologies for going on about performance.
>
> It makes a refreshing change to hear someone saying "No,
> don't do that, it would be too fast" :-).

:-)

> If performance of a Python implementation is a problem,
> then it is unlikely that any plausible modifications to
> the VM will stop it being a problem.

You may well be right. I just dislike the "code it in C, because
Python isn't fast enough" attitude. I overreacted here, based on an
assumption (Raymond hasn't even said that he intends this module to
be in C!). Sorry.

Paul.