Performance: sets vs dicts.
Terry Reedy
tjreedy at udel.edu
Tue Aug 31 19:24:19 EDT 2010
On 8/31/2010 10:09 AM, Aahz wrote:
> In article<mailman.239.1283257496.29448.python-list at python.org>,
> Jerry Hill<malaclypse2 at gmail.com> wrote:
>> On Mon, Aug 30, 2010 at 7:42 PM, Aahz<aahz at pythoncraft.com> wrote:
>>>
>>> Possibly; IMO, people should not need to run timeit to determine basic
>>> algorithmic speed for standard Python datatypes.
>>
>> http://wiki.python.org/moin/TimeComplexity takes a stab at it. IIRC,
>> last time this came up, there was some resistance to making promises
>> about time complexity in the official docs, since that would make
>> those numbers part of the language, and thus binding on other
>> implementations.
>
> I'm thoroughly aware of that page and updated it yesterday to make it
> easier to find. ;-)
>
> However, I think there are some rock-bottom basic guarantees we can make
> regardless of implementation. Does anyone seriously think that an
> implementation would be accepted that had anything other than O(1) for
> index access into tuples and lists?
Does anyone seriously think that an implementation should be rejected as
an implementation if it intellegently did seq[n] lookups in log2(n)/31
time units for all n (as humans would do), instead of stupidly taking 1
time unit for all n < 2**31 and rejecting all larger values (as 32-bit
CPython does)?
> Dicts that were not O(1) for access with non-pathological hashing?
You would actually be unhappy if small dicts ran even faster than they
do now (while large dicts ran the same)? Not me!
> I suggest that we should agree on these guarantees and document them in
> the core.
I disagree for several reasons.
1. It would a category mistake. Python is an abstract algorithm
language. The concept of speed is not applicable. I happen to think it
one of the best, which is why I once dubbed it 'executable pseudocode',
to compare it to similar but inferior, non-executable algorithm
languages too often used still today. To human readers, as readers,
speed of CPython or anything else is not relevant.
2. A Python interpreter is an abstract algorithm. Algorithms can be
characterized by operation counts, but not by physical universe time.
3. Algorithms, including Python interpreters, can be embodied in
anything that can compute, including humans, VonNeuman machines of
various types, and possible future non-VonNeuman machines. To limit
'python interpreter' to current vonNeuman machines would be short
sighted. Human interpretation of Python code (and other specifications)
is part of development, testing, and debugging.
4. Guarantee? Who will pay what to who if what is not performed ;-?. The
Python license intentionally disclaims any guarantee of performance. If
you are using 'guarantee' metaphorically, perhaps try another word.
5. Accepted? If you want to claim that an interpreter that is useless
for your purposes is not really an interpreter, you are free to, I
suppose. Asking the PSF to impose your view on me would, to me, violate
its diversity policy.
6. O-notation was developed for characterizing the asymptotic (large-N)
behavior of abstract algorithms in terms of abstract operation counts.
Left out are differences between operations, lower order terms,
multiplicative constants, how large N must be to get large-N behavior,
and what happens for smaller N. These and other factors make the
translation of O(f(n)) into real time extremely mushy or even wrong.
I already suggested that O(1) lookup is a symption of simple-minded
arithmetic stupidity, of doing unnecessary work when adding small
numbers. When it is a result doing most additions slower than necessary,
it is a bad thing, not a good thing, and a bad criteria for an
acceptable algorithm and acceptable interpreter. Similarly, books say
that O(n*logn) sorting is 'better' that O(n**2) sorting. However, Tim
Peters verified that the opposite is true for real times for small n,
and hence the current list.sort smartly uses a fast O(n**2) algorithm
for small lists (len < 64) and small runs in larger lists. Rejecting
list.sort for this reason would be stupid.
--
Terry Jan Reedy
More information about the Python-list
mailing list