Mailman 3 Re: [Python-Dev] collections module - Python-Dev

Jan. 9, 2004

      Paul Moore wrote:
...
The only reason I can see for coding in C is performance. And performance
just isn't that big a deal to me. And we don't want to give a message
that "dictionaries and lists are slow" by offering "fast" versions of
things which can be coded in terms of them.
I think we should be more concerned about how useful a new
feature is than about what "message" it gives about existing
features. Anyway, the message that will actually be given
is that for some purposes lists and dictionaries are not
the best data structures. That's obviously true, so why
try to avoid saying it?

[Bags:]
...
In terms of "making dictionaries look slow", how would you answer
the question "Why use this rather than just dict.get(k, 0) + 1"
without giving the impression that dictionaries "aren't fast enough"?
If the advantage *is* that bags are faster, then I don't
think there's any reason not to say so. If it gives the
impression that dictionaries "aren't fast enough" for the
particular tasks bags are designed for, what's so terrible?
The result would be that when people want bags they use
bags instead of emulating them with dictionaries; is that
bad?

(I'm not convinced that there would, in fact, be much
speed advantage for bags over dictionaries. I'd guess
that the real advantage would just be that some things
can be expressed more clearly in terms of bags than
in terms of dictionaries. I suspect there isn't much
clarity advantage either...)
...
...
The second type would be a high speed queue
This seems like a good idea. I have found myself on occasion
wanting to use Queue.Queue in non-threaded programs, simply because
a "queue" fits my mental model of what I am doing. But it's for
code readability reasons - not performance.
I've often built my own queue using lists, with x.append(item)
and x.pop() as the enqueue and dequeue operations. That's
horribly inefficient, but makes for reasonably neat code.
If Python grew nice fast queues with a nice interface, I'd
use those instead. It wouldn't make much difference to the
clarity of my code, but it would make it a lot faster. That
seems like a good thing to me.
...
And the same issue here over performance impressions - how do you
make this seem like a good alternative to list.pop, without making
lists look slow?
Lists *are* slow when you use them for queues. What's wrong
with admitting that?
...
...
This module could also serve as the "one obvious place to put it"
for other high performance datatypes that might be added in the
future (such as fibheaps or some such).
That is the biggest benefit I see to this proposal. There have
recently been a number of discussions of useful library code, where
much of the problem seemed to be about where to put the code,
rather than about semantics, etc.
I concur.
...
...
Guido wanted this to be discussed by a number of people.  While
a rich set of collection classes have proven their worth in other
contexts, it could be that more is not better for python.  One the
language's elegances is the ability to do just about anything with
just lists and dicts.  If you have too many mutable containers to 
choose from, the choice of which to use becomes less obvious.
My thoughts are that the existence of the array module, for example, 
has never impaired anyone's ability to learn or use lists and dicts.
That is true, but equally the array module has a feeling of being
"specialised". I'm not sure I can quantify this, but your description
of the collection module doesn't feel similarly "specialised".
Being less "specialized" is a matter of being more broadly
useful, no? That seems like a good thing to me.
...
...
Likewise, queues are in a mental concept space distinct from dicts, 
sets, and such.  My only worry with queues is differentiating
them from Queue objects which have builtin synchronization for 
use in threads.
Queues currently "feel like" a threading concept in Python (as opposed
to what they feel like in a more general data structures concept).
Having a queue in a collections module would clash with that. I'm not
against it, but I believe your worry is important - you're fiddling
with people's mental models, something bound to generate controversy...
If Python's existing library has warped users' mental models
so that they think of a queue primarily as "something you use
with threads" rather than as "a data structure where you can
add things to the front and pull them off the rear efficiently"
then that's a *bad* thing about Python's existing library, and
decoupling the two concepts by adding a fast queue implementation
will be a win.
...
Apologies for going on about performance.
It makes a refreshing change to hear someone saying "No,
don't do that, it would be too fast" :-).
...
I am in favour of the idea of
a module containing implementations of standard container data structures,
but I believe it should be in Python, and should be presented as the
"best of breed" implementation in terms of handling corner cases, and
covering subtle algorithmic issues, and providing good design, and
offering subclassability where needed. Basically, "everyone can, and
often does, reinvent this stuff in Python - but getting the details
right is tricky, so we've done it for you."
If performance of a Python implementation is a problem, speed up the
bottlenecks in the Python VM (likely to be function calls, I'd guess)
rather than spending effort reimplementing in C...
If performance of a Python implementation is a problem,
then it is unlikely that any plausible modifications to
the VM will stop it being a problem.

-- 
g

Re: [Python-Dev] collections module

Gareth McCaughan

tags

participants (1)