Ordering python sets
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sat Oct 25 20:53:18 EDT 2008
On Sat, 25 Oct 2008 21:53:10 +0000, Lie Ryan wrote:
> On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:
>
>> On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
>>
>>> <anotherrandommusing>
>>> Since python is dynamic language, I think it should be possible to do
>>> something like this:
>>>
>>> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b =
>>> dict({'a': 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
>>> implementation = 'binarytree')
>>
>> Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
>>
>> When I see a dict, I want to know that any two dicts work the same way.
>
> Oh no, the two dict implementation would work _exactly_ the same from
> the outside, they are transparently interchangeable. Only the
> performance characteristic differs because of the different
> implementation.
Exactly. That was my point.
[...]
>> I don't want to have to search the entire project's source code to find
>> out if it is a dict implemented as a hash table with O(1) lookups, or a
>> dict implemented as a binary tree with O(log N) lookups, or a dict
>> implemented as a linear array with O(N) lookups.
>
> No, you'd only need to look at the dict's creation point (or actually
> much better at projects docs, but not everyone create good docs).
And how do you find an arbitrary object's creation point without
searching the project's source code?
>> If I wanted that sort of nightmare, I can already do it by shadowing
>> the builtin:
>>
>> dict = binarytree
>> D = dict({'a': 'A'}) # make a binary tree
>
> I DON'T want THAT sort of nightmare you mentioned... And it'd be
> impossible to have two dictionary that have two different
> implementations.
Nonsense.
dict = binarytree
D1 = dict({'a': 'A'}) # make a binary tree "dict"
dict = __builtin__.dict
D2 = dict({'a': 'A'}) # make a standard dict
dict = someothertype
D3 = dict({'a': 'A'})
I'm not suggesting this is a good idea. This is a terrible idea. But it
is not much worse than your idea:
D1 = dict({'a': 'A'}, implementation='binarytree')
D2 = dict({'a': 'A'}, implementation='dict')
D3 = dict({'a': 'A'}, implementation='someothertype')
>> There is no possible good that come from this suggestion. The beauty of
>> Python is that the built-in data structures (list, dict, set) are
>> powerful enough for 99% of uses[1], and for the other 1%, you can
>> easily and explicitly use something else.
>
> Oh really? As far as I know, python's list is extremely bad if you're
> inserting data at the beginning of the list
And how often do you do that?
And when you do, use a deque. Just call it a deque.
[...]
>> But *explicitly* is the point. There's never any time where you can do
>> this:
>
> Yes, true, explicitly IS the point. How more explicit can you be than:
> dict({foo: bar}, implementation = 'binarytree')
>
>> type(mydict) is dict
You miss the point. With your plan, you can do this:
D1 = dict({foo: bar}, implementation = 'binarytree')
D2 = dict({foo: bar}, implementation = 'dict')
type(D1) is type(D2)
and yet D1 and D2 have UTTERLY different performance characteristics. So
now you need to add ANOTHER test to distinguish dicts-which-are-dicts
from dicts-which-are-binary-trees:
D1.implementation != D2.implementation
And why? So you can avoid calling a goose a goose, and call it a duck
instead.
> If my memory serves right, binary tree dict and hashed table dict is
> both a dict right? (Duck Typing)
> Only their implementation differs. Implementation is... well,
> "implementation detail".
Duck typing refers to *interface*, not implementation. I have no problem
with you using a type with the same interface as a dict. That's what duck
typing is all about. Just don't call it a dict!
>> and not know exactly what performance characteristics mydict will have.
>
> Oh... why do I need to know what the performance characteristic of
> mydict is? Unless I know what I'm doing.
http://www.joelonsoftware.com/articles/fog0000000319.html
Because when you do this:
mydict[key] = 1
it's important whether each dict lookup is O(1), O(log N) or O(N). For a
dict with one million items, that means that an implementation based on a
binary tree does O(20) times more processing than a dict, and an
implementation based on linear searching does O(1000000) times more
processing.
If you think implementation details don't matter, try this:
s1 = 'c'*(10**6)
versus
s2 = ''
for i in xrange(10**6):
s2 = 'c' + s2 # defeat optimizer
>> (Unless you shadow dict or type, or otherwise do something that breaks
>> the rules.) You never need to ask, "Okay, it's a dict. What sort of
>> dict?"
>
> Okay, it's a dict. What sort of dict? Who the hell cares?
If you don't care, then why are you specifying the implementation type?
mydict = dict({'foo': 'bar'}, implementation="surprise me!")
You can't have it both ways. If you care, then you know enough to want a
hash table based dict (the standard) or a binary tree or something else.
So go ahead and use whatever data structure you want. Just don't call it
a dict.
But if you don't care, then just use the Python standard data types.
> I don't need
> to know, they all looks and behave the same (Duck Typing)...
Until you try a simple operation like len(mydict) and it takes three
minutes because the implementation you choose is really inefficient.
> Sometimes we need a data type to use a specific data structure that have
> some specific performance characteristic, because we know we'll be doing
> a specific operation a lot more than other operations.
I'm not arguing that you should never use any other data structure. Go
right ahead and use them all you like.
>> If you want a binary tree, ask for a binary tree.
>
> Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
> implementation = 'binarytree')
Or you could do it the right way:
D1 = binarytree(data)
D2 = dict(data)
type(D1), type(D2)
=> returns binarytree, dict
versus:
D1 = dict(data, implementation='binarytree')
D2 = dict(data)
type(D1), type(D2)
=> returns dict, dict
--
Steven
More information about the Python-list
mailing list