The type of the result of the copy() method
The copy() methods of list, dict, bytearray, set, frozenset, WeakValueDictionary, WeakKeyDictionary return an instance of the base type containing the content of the original collection. The copy() methods of deque, defaultdict, OrderedDict, Counter, ChainMap, UserDict, UserList, WeakSet, ElementTree.Element return an instance of the same type as the original collection. The copy() method of mappingproxy returns a copy of the underlying mapping (using its copy() method). os.environ.copy() returns a dict. Shouldn't it be more consistent?
It probably should be more consistent and I have a vague recollection that
this has been brought up before.
On Sun, Oct 29, 2017, 08:21 Serhiy Storchaka,
The copy() methods of list, dict, bytearray, set, frozenset, WeakValueDictionary, WeakKeyDictionary return an instance of the base type containing the content of the original collection.
The copy() methods of deque, defaultdict, OrderedDict, Counter, ChainMap, UserDict, UserList, WeakSet, ElementTree.Element return an instance of the same type as the original collection.
The copy() method of mappingproxy returns a copy of the underlying mapping (using its copy() method).
os.environ.copy() returns a dict.
Shouldn't it be more consistent?
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
It's somewhat problematic. If I subclass dict with a different constructor,
but I don't overload copy(), how can the dict.copy() method construct a
correct instance of the subclass? Even if the constructor signatures match,
how can dict.copy() make sure it copies all attributes properly? Without an
answer to these questions I think it's better to admit defeat and return a
dict instance -- classes that want to do better should overload copy().
I notice that Counter.copy() has all the problems I indicate here -- it
works as long as you don't add attributes or change the constructor
signature. I bet this isn't documented anywhere.
On Sun, Oct 29, 2017 at 9:40 AM, Brett Cannon
It probably should be more consistent and I have a vague recollection that this has been brought up before.
On Sun, Oct 29, 2017, 08:21 Serhiy Storchaka,
wrote: The copy() methods of list, dict, bytearray, set, frozenset, WeakValueDictionary, WeakKeyDictionary return an instance of the base type containing the content of the original collection.
The copy() methods of deque, defaultdict, OrderedDict, Counter, ChainMap, UserDict, UserList, WeakSet, ElementTree.Element return an instance of the same type as the original collection.
The copy() method of mappingproxy returns a copy of the underlying mapping (using its copy() method).
os.environ.copy() returns a dict.
Shouldn't it be more consistent?
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Oct 29, 2017, at 10:04 AM, Guido van Rossum
wrote: Without an answer to these questions I think it's better to admit defeat and return a dict instance
I think it is better to admit success and recognize that these APIs have fared well in the wild. Focusing just on OrderedDict() and dict(), I don't see how to change the copy() method for either of them without breaking existing code. OrderedDict *is* a dict subclass but really does need to have copy() return an OrderedDict. The *default* behavior for any pure python class is for copy.copy() to return an instance of that class. We really don't want ChainMap() to return a dict instance -- that would defeat the whole purpose of having a ChainMap in the first place. And unlike the original builtin classes, most of the collection classes were specifically designed to be easily subclassable (not making the subclasser do work unnecessarily). These aren't accidental behaviors: class ChainMap(MutableMapping): def copy(self): 'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]' return self.__class__(self.maps[0].copy(), *self.maps[1:]) Do you really want that changed to: return ChainMap(self.maps[0].copy(), *self.maps[1:]) Or to: return dict(self) Do you really want Serhiy to sweep through the code and change all of these long standing APIs, overriding the decisions of the people who designed those classes, and breaking all user code that reasonably relied on those useful and intentional behaviors? Raymond P.S. Possibly related: We've gone out of way in many classes to have a __repr__ that uses the name of the subclass. Presumably, this is to make life easier for subclassers (one less method they have to override), but it does make an assumption about what the subclass signature looks like. IIRC, our position on that has been that a subclasser who changes the signature would then need to override the __repr__. ISTM that similar reasoning would apply to copy.
On Sun, Oct 29, 2017 at 10:41 AM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
On Oct 29, 2017, at 10:04 AM, Guido van Rossum
wrote: Without an answer to these questions I think it's better to admit defeat and return a dict instance
I think it is better to admit success and recognize that these APIs have fared well in the wild.
Oh, I agree! Focusing just on OrderedDict() and dict(), I don't see how to change the
copy() method for either of them without breaking existing code. OrderedDict *is* a dict subclass but really does need to have copy() return an OrderedDict.
And I wasn't proposing that. I like what OrderedDict does -- I was just suggesting that the *default* dict.copy() needn't worry about this.
The *default* behavior for any pure python class is for copy.copy() to return an instance of that class. We really don't want ChainMap() to return a dict instance -- that would defeat the whole purpose of having a ChainMap in the first place.
Of course. And unlike the original builtin classes, most of the collection classes
were specifically designed to be easily subclassable (not making the subclasser do work unnecessarily). These aren't accidental behaviors:
class ChainMap(MutableMapping):
def copy(self): 'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]' return self.__class__(self.maps[0].copy(), *self.maps[1:])
Do you really want that changed to:
return ChainMap(self.maps[0].copy(), *self.maps[1:])
Or to:
return dict(self)
I think you've misread what I meant. (The defeat I referred to was accepting the status quo, no matter how inconsistent it seems, not a withdrawal to some other seemingly inconsistent but different rule.)
Do you really want Serhiy to sweep through the code and change all of these long standing APIs, overriding the decisions of the people who designed those classes, and breaking all user code that reasonably relied on those useful and intentional behaviors?
No, and I never said that. Calm down. Raymond
P.S. Possibly related: We've gone out of way in many classes to have a __repr__ that uses the name of the subclass. Presumably, this is to make life easier for subclassers (one less method they have to override), but it does make an assumption about what the subclass signature looks like. IIRC, our position on that has been that a subclasser who changes the signature would then need to override the __repr__. ISTM that similar reasoning would apply to copy.
I don't think the same reasoning applies. When the string returned doesn't indicate the true class of the object, debugging becomes a lot harder. If the signature in the repr() output is wrong, the user can probably deal with that. And yes, the subclasser who wants the best possible repr() needs to override it, but the use cases don't match. -- --Guido van Rossum (python.org/~guido)
29.10.17 19:04, Guido van Rossum пише:
It's somewhat problematic. If I subclass dict with a different constructor, but I don't overload copy(), how can the dict.copy() method construct a correct instance of the subclass? Even if the constructor signatures match, how can dict.copy() make sure it copies all attributes properly? Without an answer to these questions I think it's better to admit defeat and return a dict instance -- classes that want to do better should overload copy().
I notice that Counter.copy() has all the problems I indicate here -- it works as long as you don't add attributes or change the constructor signature. I bet this isn't documented anywhere.
I am familiar with these reasons, and agree with them. But I'm curious why some collections chose the way of creating an instance of the same class. For creating an instance of the same class we have the __copy__() method. An attempt to preserve a class in the returned value can cause problems. For example, the __add__() and __mul__() methods of deque first make a copy of the same type, and this can cause a crash [1]. Of course this is not occurred in real code, it is just yet one way of crashing the interpreter from Python code. list and tuple are free from this problem since their corresponding methods (as well as copy()) create an instance of the corresponding base type. I think there were reasons for copying the type in results. It would be nice to formalize the criteria, in what cases copy() and other methods should return an instance of the base class, and in what cases they should create an instance of the same type as the original object. This would help for new types. And maybe we need to change some existing type (the inconsistency between WeakKeyDictionary and WeakSet looks weird). [1] https://bugs.python.org/issue31608
On Tue, Oct 31, 2017 at 3:12 AM, Serhiy Storchaka
29.10.17 19:04, Guido van Rossum пише:
It's somewhat problematic. If I subclass dict with a different constructor, but I don't overload copy(), how can the dict.copy() method construct a correct instance of the subclass? Even if the constructor signatures match, how can dict.copy() make sure it copies all attributes properly? Without an answer to these questions I think it's better to admit defeat and return a dict instance -- classes that want to do better should overload copy().
I notice that Counter.copy() has all the problems I indicate here -- it works as long as you don't add attributes or change the constructor signature. I bet this isn't documented anywhere.
I am familiar with these reasons, and agree with them. But I'm curious why some collections chose the way of creating an instance of the same class. For creating an instance of the same class we have the __copy__() method.
An attempt to preserve a class in the returned value can cause problems. For example, the __add__() and __mul__() methods of deque first make a copy of the same type, and this can cause a crash [1]. Of course this is not occurred in real code, it is just yet one way of crashing the interpreter from Python code. list and tuple are free from this problem since their corresponding methods (as well as copy()) create an instance of the corresponding base type.
I think there were reasons for copying the type in results. It would be nice to formalize the criteria, in what cases copy() and other methods should return an instance of the base class, and in what cases they should create an instance of the same type as the original object. This would help for new types. And maybe we need to change some existing type (the inconsistency between WeakKeyDictionary and WeakSet looks weird).
I think it all depends on the use case. (Though in some cases I suspect the class' author didn't think too hard about it.) The more strict rule should be that a base class cannot know how to create a subclass instance and hence it should not bother. (Or perhaps it should use the __copy__ protocol.) But there are some cases where a useful pattern of subclassing a stdlib class just to add some convenience methods to it, without changing its essence. In those cases, it might be convenient that by default you get something that preserves its type (and full contents) when copying without having to explicitly implement copy() or __copy__(). Another useful rule is that if a class *does* have a copy() method, a subclass *ought* to override it (or __copy__()) to make it work right. IOW from the class author's POV, copy() should not attempt to copy the type of a subclass. But from the user's POV copy() is more useful if it copies the type. This places the burden on the subclass author to override copy() or __copy__(). Traditionally we've done a terrible job at documenting what you should to do subclass a class, and what you can expect from the base class (e.g. which parts of the base class are part of the API for subclasses, and which parts are truly private -- underscores aren't used consistently in many class implementations). For those classes that currently preserve the type in copy(), perhaps we could document that if one overrides __init__() or __new__() one should also override copy() or __copy__(). And for future classes we should recommend whether it's preferred to preserve the type in copy() or not -- I'm not actually sure what to recommend here. I guess it depends on what other methods of the class return new instances. If there are a lot (like for int or str) then copy() should follow those methods' lead. Sorry about the rambling, this is hard to get consistent. -- --Guido van Rossum (python.org/~guido)
On Oct 29, 2017, at 8:19 AM, Serhiy Storchaka
wrote: The copy() methods of list, dict, bytearray, set, frozenset, WeakValueDictionary, WeakKeyDictionary return an instance of the base type containing the content of the original collection.
The copy() methods of deque, defaultdict, OrderedDict, Counter, ChainMap, UserDict, UserList, WeakSet, ElementTree.Element return an instance of the same type as the original collection.
The copy() method of mappingproxy returns a copy of the underlying mapping (using its copy() method).
os.environ.copy() returns a dict.
Shouldn't it be more consistent?
Not really. It is up to the class designer to make a decision about what the most useful behavior would be for subclassers. Note for a regular Python class, copy.copy() by default creates an instance of the subclass. On the other hand, instances like int() are harder to subclass because all the int operations such as __add__ produce exact int() instances (this is likely because so few assumptions can be made about the subclass and because it isn't clear what the semantics would be otherwise). Also, the time to argue and change APIs is BEFORE they are released, not a decade or two after they've lived successfully in the wild. Raymond
participants (4)
-
Brett Cannon
-
Guido van Rossum
-
Raymond Hettinger
-
Serhiy Storchaka