[Python-Dev] collections module

Gareth McCaughan gmccaughan at synaptics-uk.com
Fri Jan 9 07:33:42 EST 2004


Paul Moore wrote:

> The only reason I can see for coding in C is performance. And performance
> just isn't that big a deal to me. And we don't want to give a message
> that "dictionaries and lists are slow" by offering "fast" versions of
> things which can be coded in terms of them.

I think we should be more concerned about how useful a new
feature is than about what "message" it gives about existing
features. Anyway, the message that will actually be given
is that for some purposes lists and dictionaries are not
the best data structures. That's obviously true, so why
try to avoid saying it?

[Bags:]
> In terms of "making dictionaries look slow", how would you answer
> the question "Why use this rather than just dict.get(k, 0) + 1"
> without giving the impression that dictionaries "aren't fast enough"?

If the advantage *is* that bags are faster, then I don't
think there's any reason not to say so. If it gives the
impression that dictionaries "aren't fast enough" for the
particular tasks bags are designed for, what's so terrible?
The result would be that when people want bags they use
bags instead of emulating them with dictionaries; is that
bad?

(I'm not convinced that there would, in fact, be much
speed advantage for bags over dictionaries. I'd guess
that the real advantage would just be that some things
can be expressed more clearly in terms of bags than
in terms of dictionaries. I suspect there isn't much
clarity advantage either...)

>> The second type would be a high speed queue
> 
> This seems like a good idea. I have found myself on occasion
> wanting to use Queue.Queue in non-threaded programs, simply because
> a "queue" fits my mental model of what I am doing. But it's for
> code readability reasons - not performance.

I've often built my own queue using lists, with x.append(item)
and x.pop() as the enqueue and dequeue operations. That's
horribly inefficient, but makes for reasonably neat code.
If Python grew nice fast queues with a nice interface, I'd
use those instead. It wouldn't make much difference to the
clarity of my code, but it would make it a lot faster. That
seems like a good thing to me.

> And the same issue here over performance impressions - how do you
> make this seem like a good alternative to list.pop, without making
> lists look slow?

Lists *are* slow when you use them for queues. What's wrong
with admitting that?

>> This module could also serve as the "one obvious place to put it"
>> for other high performance datatypes that might be added in the
>> future (such as fibheaps or some such).
> 
> That is the biggest benefit I see to this proposal. There have
> recently been a number of discussions of useful library code, where
> much of the problem seemed to be about where to put the code,
> rather than about semantics, etc.

I concur.

>> Guido wanted this to be discussed by a number of people.  While
>> a rich set of collection classes have proven their worth in other
>> contexts, it could be that more is not better for python.  One the
>> language's elegances is the ability to do just about anything with
>> just lists and dicts.  If you have too many mutable containers to 
>> choose from, the choice of which to use becomes less obvious.  
>>
>> My thoughts are that the existence of the array module, for example, 
>> has never impaired anyone's ability to learn or use lists and dicts.
> 
> That is true, but equally the array module has a feeling of being
> "specialised". I'm not sure I can quantify this, but your description
> of the collection module doesn't feel similarly "specialised".

Being less "specialized" is a matter of being more broadly
useful, no? That seems like a good thing to me.

>> Likewise, queues are in a mental concept space distinct from dicts, 
>> sets, and such.  My only worry with queues is differentiating
>> them from Queue objects which have builtin synchronization for 
>> use in threads.
> 
> Queues currently "feel like" a threading concept in Python (as opposed
> to what they feel like in a more general data structures concept).
> Having a queue in a collections module would clash with that. I'm not
> against it, but I believe your worry is important - you're fiddling
> with people's mental models, something bound to generate controversy...

If Python's existing library has warped users' mental models
so that they think of a queue primarily as "something you use
with threads" rather than as "a data structure where you can
add things to the front and pull them off the rear efficiently"
then that's a *bad* thing about Python's existing library, and
decoupling the two concepts by adding a fast queue implementation
will be a win.

> Apologies for going on about performance.

It makes a refreshing change to hear someone saying "No,
don't do that, it would be too fast" :-).

>                                           I am in favour of the idea of
> a module containing implementations of standard container data structures,
> but I believe it should be in Python, and should be presented as the
> "best of breed" implementation in terms of handling corner cases, and
> covering subtle algorithmic issues, and providing good design, and
> offering subclassability where needed. Basically, "everyone can, and
> often does, reinvent this stuff in Python - but getting the details
> right is tricky, so we've done it for you."
> 
> If performance of a Python implementation is a problem, speed up the
> bottlenecks in the Python VM (likely to be function calls, I'd guess)
> rather than spending effort reimplementing in C...

If performance of a Python implementation is a problem,
then it is unlikely that any plausible modifications to
the VM will stop it being a problem.

-- 
g





More information about the Python-Dev mailing list