Ordering python sets

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Oct 26 02:53:18 CEST 2008

On Sat, 25 Oct 2008 21:53:10 +0000, Lie Ryan wrote:

> On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:
>> On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
>>> <anotherrandommusing>
>>> Since python is dynamic language, I think it should be possible to do
>>> something like this:
>>> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b =
>>> dict({'a': 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
>>> implementation = 'binarytree')
>> Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
>> When I see a dict, I want to know that any two dicts work the same way.
> Oh no, the two dict implementation would work _exactly_ the same from
> the outside, they are transparently interchangeable. Only the
> performance characteristic differs because of the different
> implementation.

Exactly. That was my point.

>> I don't want to have to search the entire project's source code to find
>> out if it is a dict implemented as a hash table with O(1) lookups, or a
>> dict implemented as a binary tree with O(log N) lookups, or a dict
>> implemented as a linear array with O(N) lookups.
> No, you'd only need to look at the dict's creation point (or actually
> much better at projects docs, but not everyone create good docs).

And how do you find an arbitrary object's creation point without 
searching the project's source code?

>> If I wanted that sort of nightmare, I can already do it by shadowing
>> the builtin:
>> dict = binarytree
>> D = dict({'a': 'A'})  # make a binary tree
> I DON'T want THAT sort of nightmare you mentioned... And it'd be
> impossible to have two dictionary that have two different
> implementations.


dict = binarytree
D1 = dict({'a': 'A'})  # make a binary tree "dict"
dict = __builtin__.dict
D2 = dict({'a': 'A'})  # make a standard dict
dict = someothertype
D3 = dict({'a': 'A'})

I'm not suggesting this is a good idea. This is a terrible idea. But it 
is not much worse than your idea:

D1 = dict({'a': 'A'}, implementation='binarytree')
D2 = dict({'a': 'A'}, implementation='dict')
D3 = dict({'a': 'A'}, implementation='someothertype')

>> There is no possible good that come from this suggestion. The beauty of
>> Python is that the built-in data structures (list, dict, set) are
>> powerful enough for 99% of uses[1], and for the other 1%, you can
>> easily and explicitly use something else.
> Oh really? As far as I know, python's list is extremely bad if you're
> inserting data at the beginning of the list

And how often do you do that?

And when you do, use a deque. Just call it a deque.

>> But *explicitly* is the point. There's never any time where you can do
>> this:
> Yes, true, explicitly IS the point. How more explicit can you be than:
> dict({foo: bar}, implementation = 'binarytree')
>> type(mydict) is dict

You miss the point. With your plan, you can do this:

D1 = dict({foo: bar}, implementation = 'binarytree')
D2 = dict({foo: bar}, implementation = 'dict')
type(D1) is type(D2)

and yet D1 and D2 have UTTERLY different performance characteristics. So 
now you need to add ANOTHER test to distinguish dicts-which-are-dicts 
from dicts-which-are-binary-trees:

D1.implementation != D2.implementation

And why? So you can avoid calling a goose a goose, and call it a duck 

> If my memory serves right, binary tree dict and hashed table dict is
> both a dict right? (Duck Typing)
> Only their implementation differs. Implementation is... well,
> "implementation detail".

Duck typing refers to *interface*, not implementation. I have no problem 
with you using a type with the same interface as a dict. That's what duck 
typing is all about. Just don't call it a dict!

>> and not know exactly what performance characteristics mydict will have.
> Oh... why do I need to know what the performance characteristic of
> mydict is? Unless I know what I'm doing.


Because when you do this:

mydict[key] = 1

it's important whether each dict lookup is O(1), O(log N) or O(N). For a 
dict with one million items, that means that an implementation based on a 
binary tree does O(20) times more processing than a dict, and an 
implementation based on linear searching does O(1000000) times more 

If you think implementation details don't matter, try this:

s1 = 'c'*(10**6)


s2 = ''
for i in xrange(10**6):
    s2 = 'c' + s2  # defeat optimizer

>> (Unless you shadow dict or type, or otherwise do something that breaks
>> the rules.) You never need to ask, "Okay, it's a dict. What sort of
>> dict?"
> Okay, it's a dict. What sort of dict? Who the hell cares? 

If you don't care, then why are you specifying the implementation type?

mydict = dict({'foo': 'bar'}, implementation="surprise me!")

You can't have it both ways. If you care, then you know enough to want a 
hash table based dict (the standard) or a binary tree or something else. 
So go ahead and use whatever data structure you want. Just don't call it 
a dict.

But if you don't care, then just use the Python standard data types.

> I don't need
> to know, they all looks and behave the same (Duck Typing)...

Until you try a simple operation like len(mydict) and it takes three 
minutes because the implementation you choose is really inefficient.

> Sometimes we need a data type to use a specific data structure that have
> some specific performance characteristic, because we know we'll be doing
> a specific operation a lot more than other operations.

I'm not arguing that you should never use any other data structure. Go 
right ahead and use them all you like. 

>> If you want a binary tree, ask for a binary tree.
> Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
> implementation = 'binarytree')

Or you could do it the right way:

D1 = binarytree(data)
D2 = dict(data)
type(D1), type(D2)
=> returns binarytree, dict


D1 = dict(data, implementation='binarytree')
D2 = dict(data)
type(D1), type(D2)
=> returns dict, dict


More information about the Python-list mailing list