Ordering python sets

Lie Ryan lie.1296 at gmail.com
Sun Oct 26 08:18:12 EDT 2008


On Sun, 26 Oct 2008 00:53:18 +0000, Steven D'Aprano wrote:
[...]
> And how do you find an arbitrary object's creation point without
> searching the project's source code?

How is it better using the current way?
Asking the .implementation field isn't much harder than asking the type
(), and is much more obvious that we're asking the "implementation" being 
used.

>>> If I wanted that sort of nightmare, I can already do it by shadowing
>>> the builtin:
>>> 
>>> dict = binarytree
>>> D = dict({'a': 'A'})  # make a binary tree
>> 
>> I DON'T want THAT sort of nightmare you mentioned... And it'd be
>> impossible to have two dictionary that have two different
>> implementations.
>
> Nonsense.
>
<snip>

There is two good arguments for using a plain string to specify the 
implementation used: 1) Plain-text Configuration, 2) argument passing.

1) The current way requires you have a map of the configuration text to a 
function (i.e. imps = {'linkedlist': linkedlist}), than assign the 
function to _a global name_ (i.e. lista = imps[configuration[confkey]] -- 
namespace pollution), then instantiate an object using that (ls = lista
([...]))). The implementation way, only requires ls = list([...], 
implementation = configuration[confkey])

2) Easily said, plain text passing is safer and easier than passing 
function object.

>>> There is no possible good that come from this suggestion. The beauty
>>> of Python is that the built-in data structures (list, dict, set) are
>>> powerful enough for 99% of uses[1], and for the other 1%, you can
>>> easily and explicitly use something else.
>> 
>> Oh really? As far as I know, python's list is extremely bad if you're
>> inserting data at the beginning of the list
> 
> And how often do you do that?

It's not even a deque yet, a regular queue is slow using list in python, 
and any insertion sort requires you to insert elements into arbitrary 
position. This makes using array for it a O(n**2). On the other hand, 
using binary tree is a O(log n). 

> [...]
>>> But *explicitly* is the point. There's never any time where you can do
>>> this:
>> 
>> Yes, true, explicitly IS the point. How more explicit can you be than:
>> dict({foo: bar}, implementation = 'binarytree')
>> 
>>> type(mydict) is dict
> 
> 
> You miss the point. With your plan, you can do this:
> 
> D1 = dict({foo: bar}, implementation = 'binarytree') D2 = dict({foo:
> bar}, implementation = 'dict') type(D1) is type(D2)
> 
> and yet D1 and D2 have UTTERLY different performance characteristics. 

It does not matter that it have different performance characteristic.

> So
> now you need to add ANOTHER test to distinguish dicts-which-are-dicts
> from dicts-which-are-binary-trees:

How often would you need to care about the specific implementation used? 
Compare with how often you want to know how to work with the object. Most 
of the time, you only need to know how to work with it (i.e. knowing its 
interface), only _sometimes_ when it DOES matter speed-wise, you'd need 
to know and affect its implementation.

> D1.implementation != D2.implementation
> 
> And why? So you can avoid calling a goose a goose, and call it a duck
> instead.

I think we have an utterly different view here.
I wished that list() becomes an abstract type, instead of a data 
structure. (binarytreelist and arraylist is some property of a list)

While you insisted that list() is a data structure, and abstract type is 
some obscureness somewhere unknown. 

>> If my memory serves right, binary tree dict and hashed table dict is
>> both a dict right? (Duck Typing)
>> Only their implementation differs. Implementation is... well,
>> "implementation detail".
> 
> Duck typing refers to *interface*, not implementation. 

Well said, you've just supported my view. 

> I have no problem
> with you using a type with the same interface as a dict. That's what
> duck typing is all about. Just don't call it a dict!

why do you think binarytreedict is "less dict" than hasheddict?

>>> and not know exactly what performance characteristics mydict will
>>> have.
>> 
>> Oh... why do I need to know what the performance characteristic of
>> mydict is? Unless I know what I'm doing.
> 
> http://www.joelonsoftware.com/articles/fog0000000319.html

Sometimes it is good to have control of everything. Sometimes (most of 
the time) I know I can dump some of the details to think on more 
important matters. To know which parts need more thinking by doing 
profiling.

> Because when you do this:
> 
> mydict[key] = 1
> 
> it's important whether each dict lookup is O(1), O(log N) or O(N). For a
> dict with one million items, that means that an implementation based on
> a binary tree does O(20) times more processing than a dict, and an
> implementation based on linear searching does O(1000000) times more
> processing.

Well, I don't need to know how the dict works if I'm only using it to 
store twenty items right? (I do know how python's dict/set/list/etc is 
implemented, but unless that particular dict is central to my program, 
I'm not going to create my programs around how dict is implemented, I'm 
going to create it on how my mind works). 

I don't care how print statement works, unless I'm trying to do something 
fancy with print. I don't care that print takes a series of electrical 
signal, then convert it into a series of bytes and pass it into the 
terminal stdout then the terminal will figure out how to draw the string 
using the appropriate font at the appropriate place and pass itself to a 
window manager which would compose the desktop into an image then pass 
itself to the graphic driver which would convert the image to electrical 
signal to be sent to the graphic card which would do some fancy tricks 
before sending it to the monitor, when what I need is only to show the 
user a simple welcome message.

The reason we use high-level language is that we know we don't need to 
care about some details. Only when the detail matters then we take care 
of it. If I do care about all the details, I'd use C, no, no, lower, I'd 
use assembly, no, lower..., I'd design my own CPU, no even lower, I'd 
design my own non-von-neumann architecture. I don't think I can change 
the rules of physics though, but if I can... <some more pseudo random 
bytes>

>> I don't need
>> to know, they all looks and behave the same (Duck Typing)...
> 
> Until you try a simple operation like len(mydict) and it takes three
> minutes because the implementation you choose is really inefficient.

... then I know I have chosen the wrong data structure, and I know I need 
to change it.
 
>>> If you want a binary tree, ask for a binary tree.
>> 
>> Yeah, you ask for binary tree EXPLICITLY: bintreedict = dict({a:b},
>> implementation = 'binarytree')
> 
> Or you could do it the right way:
<snip>

For example, if I have a certain unknown object. What is easier?

>>> type(URO) # Unidentified Round Object
=> returns (*&%$^%*

or

>>> type(URO)
=> returns diningplate

only after three thousand years of continuous and laborious researching, 
at last I found out that (*&%$^%* is an alien dining plate, and it does 
have all the features and functionality as earth dining plate, although 
due to it being made of CClKO32C32CH42U it is much more resistant to 
acidity but would melt when you put carrot on it.

Only when I want to eat carrot, would I care that it is an alien dining 
plate. And only when I want to eat something extremely acidic, I'd chose 
the alien dining plate over earth dining plate.




More information about the Python-list mailing list