PEP 455: TransformDict

Hello, Following the python-dev discussion, I've written a PEP to recap the proposal and the various arguments. It's inlined below, and it will probably appear soon at http://www.python.org/dev/peps/pep-0455/, too. Regards Antoine. PEP: 455 Title: Adding a key-transforming dictionary to collections Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou <solipsis@pitrou.net> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 13-Sep-2013 Python-Version: 3.4 Post-History: Abstract ======== This PEP proposes a new data structure for the ``collections`` module, called "TransformDict" in this PEP. This structure is a mutable mapping which transforms the key using a given function when doing a lookup, but retains the original key when reading. Rationale ========= Numerous specialized versions of this pattern exist. The most common is a case-insensitive case-preserving dict, i.e. a dict-like container which matches keys in a case-insensitive fashion but retains the original casing. It is a very common need in network programming, as many protocols feature some arrays of "key / value" properties in their messages, where the keys are textual strings whose casing isn't relevant. Another common request is an identity dict, where keys are matched according to their respective id()s instead of normal matching. Both are instances of a more general pattern, where a given transformation function is applied to keys when looking them up: that function being ``str.lower`` in the former example and the built-in ``id`` function in the latter. (it can be said that the pattern *projects* keys from the user-visible set onto the internal lookup set, hence this PEP's title) Semantics ========= TransformDict is a ``MutableMapping`` implementation: it faithfully implements the well-known API of mutable mappings, as ``dict`` itself and other dict-like classes in the standard library. Therefore, this PEP won't rehash the semantics of most TransformDict methods. The transformation function needn't be bijective, it can be strictly surjective as in the case-insensitive example::
TransformDict retains the first key used when creating an entry::
The original keys needn't be hashable, as long as the transformation function returns a hashable one::
Constructor ----------- As shown in the example aboves, creating a TransformDict requires passing the key transformation function as the first argument (much like creating a ``defaultdict`` requires passing the factory function as first argument). The constructor also takes other optional arguments which can be used to initialize the TransformDict with certain key-value pairs. Those optional arguments are the same as in the ``dict`` and ``defaultdict`` constructors::
Alternative proposals and questions =================================== Retaining the last original key ------------------------------- Most python-dev respondents found retaining the first user-supplied key more intuitive than retaining the last. Also, it matches the dict object's own behaviour when using different but equal keys::
Furthermore, explicitly retaining the last key in a first-key-retaining scheme is still possible using the following approach:: d.pop(key, None) d[key] = value while the converse (retaining the first key in a last-key-retaining scheme) doesn't look possible without rewriting part of the container's code. Using an encoder / decoder pair ------------------------------- Using a function pair isn't necessary, since the original key is retained by the container. Moreover, an encoder / decoder pair would require the transformation to be bijective, which prevents important use cases like case-insensitive matching. Providing a transformation function for values ---------------------------------------------- Dictionary values are not used for lookup, their semantics are totally irrelevant to the container's operation. Therefore, there is no point in having both an "original" and a "transformed" value: the transformed value wouldn't be used for anything. Providing a specialized container, not generic ---------------------------------------------- It was asked why we would provide the generic TransformDict construct rather than a specialized case-insensitive dict variant. The answer is that it's nearly as cheap (code-wise and performance-wise) to provide the generic construct, and it can fill more use cases. Implementation ============== A patch for the collections module is tracked on the bug tracker at http://bugs.python.org/issue18986. Existing work ============= Case-insensitive dicts are a popular request: * http://twistedmatrix.com/documents/current/api/twisted.python.util.Insensiti... * https://mail.python.org/pipermail/python-list/2013-May/647243.html * https://mail.python.org/pipermail/python-list/2005-April/296208.html * https://mail.python.org/pipermail/python-list/2004-June/241748.html * http://bugs.python.org/msg197376 * http://stackoverflow.com/a/2082169 * http://stackoverflow.com/a/3296782 * http://code.activestate.com/recipes/66315-case-insensitive-dictionary/ * https://gist.github.com/babakness/3901174 * http://www.wikier.org/blog/key-insensitive-dictionary-in-python * http://en.sharejs.com/python/14534 * http://www.voidspace.org.uk/python/archive.shtml#caseless Identity dicts have been requested too: * https://mail.python.org/pipermail/python-ideas/2010-May/007235.html * http://www.gossamer-threads.com/lists/python/python/209527 Python's own pickle module uses identity lookups for object memoization: http://hg.python.org/cpython/file/0e70bf1f32a3/Lib/pickle.py#l234 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

13.09.13 21:40, Antoine Pitrou написав(ла):
Please use str.casefold in examples.
>>> d = TransformDict(str.lower, [('Foo': 1)], Bar=2)
{'Foo': 1} or [('Foo', 1)].
Except lightweight IdentityDict which can be implemented more efficient than TransformDict(id). It doesn't need in calling the transform function, computing the hash of transformed key, comparing keys. But perhaps in many cases TransformDict(id) is enough.
Also copy, json, cProfile, doctest and _threading_local.

On Fri, 13 Sep 2013 22:31:02 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
Ok, thanks.
That's true. But it's only important if TransformDict is the bottleneck. I doubt the memoizing dictionary is a bottleneck in e.g. the pure Python implementation of pickle or json.
Thanks, will add them. Regards Antoine.

On Fri, 13 Sep 2013 16:54:01 -0300 "Joao S. O. Bueno" <jsbueno@python.org.br> wrote:
I see the PEP does not contemplate a way to retrieve the original key - like we've talked about somewhere along the thread.
Indeed. If that's important I can add it. I was hoping to keep very close to the MutableMapping API, to make the PEP as few controversial as possible. Regards Antoine.

On Fri, 13 Sep 2013 23:18:39 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
Ok, I have a better (IMO) proposal: >>> d = TransformDict(str.casefold, {'Foo': 1}) >>> d.getitem('foo') ('Foo', 1) >>> d.getitem('bar') Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'bar' Regards Antoine.

On Fri, Sep 13, 2013 at 11:45:16PM +0200, Antoine Pitrou wrote:
Is it more common to want both the canonical key and value at the same time, or to just want the canonical key? My gut feeling is that I'm likely to have code like this: d = TransformDict(...) for key in data: key = d.get_canonical(key) value = d[key] print("{}: {}".format(key, value)) in which case having a single call to return both will be great: for key in data: key, value = d.getitem(key) print("{}: {}".format(key, value)) but I'm really not sure. Maybe Ethan is right. I think these sorts of associated questions are why some people (Raymond, Nick) want to see the proposal live outside of the standard library for a while first. The general idea is great, but we're going to bike shed the API without having much idea of how it will actually be used. So, my suggestion is this: - Let's add __transform__ to dicts for 3.4, similar to __missing__, and see how people end up using it in the real world. - Then, in 3.5, we can make a much better informed decision about the best API for a TransformedDict front-end to __transform__. +1 on __transform__ method on dicts. +0 on TransformedDict in 3.4 +1 on waiting for 3.5 based on experience on using __transform__. -- Steven

On 09/13/2013 05:49 PM, Steven D'Aprano wrote:
+1 on __transform__ method on dicts.
What would __transform__ do? Just canonicalize the key, or also save the original key? How much front-end work will each user have to do to actually use this new magic method to good effect? Personally, if there's a bunch of push-back against just adding TransformDict directly, why don't we make it provisional? I thought that was what provisional was for (meaning: we're going to add it, PyPI is not really appropriate, there may be some API changes). -- ~Ethan~

On Fri, Sep 13, 2013 at 06:00:18PM -0700, Ethan Furman wrote:
Not according to PEP 411. It implies that only modules/packages can be provisional, not individual functions, and states that "most packages" are expected to be provisional. So either PEP 411 doesn't apply to TransformDict at all, or it applies by default. The PEP doesn't say. http://www.python.org/dev/peps/pep-0411/ Everything below the line is about PEP 411, not TransformDict. If you don't care about PEP 411, you can stop reading now. ========================== Personally, I think it's a poor PEP. It doesn't document opposition to the idea, and if I recall the discussion at the time correctly, there was plenty of opposition. - Since people cannot rely on provisional features still being available in the future, if they care the slightest about forward compatibility, they dare not use them. - If people do use them, and we care about backwards compatibility, we dare not remove provisional packages without going through the same deprecation process as for ordinary packages. - It relies on people reading the documentation and noticing that a package is marked "provisional". Like that's going to happen. None of these arguments are documented in the PEP, let alone refuted. Even if the decision to approve the PEP ends up being vindicated, I think it was poor form for the PEP to ignore arguments against. I don't think that "provisional" helps end users at all. If anything, it hurts them -- it means more uncertainty and doubt. Packages may languish in provisional status indefinitely. Speaking as an end user, I used to know that once a feature hit the std lib, it was stable and wouldn't be removed without going through a lengthy period of deprecation (except under truly remarkable circumstances). But for provisional packages, that promise doesn't apply, and although PEP 411 says that packages won't be gratuitously removed, I have to assume that any provisional package in version 3.x could be gone without notice or warning in 3.x+1. I don't see how that helps me. It just means that for some indefinite period, there's a package in the std lib doing exactly what I want that I dare not use in production software because there are no stability guarantees. The PEP has a lengthy section that claims to be about how it will help end users. It actually isn't -- it is about how inclusion in the standard library helps end users. Not surprisingly, there's nothing about why reserving the right to rip out a package without warning is good for end uers, since it isn't. End of rant. -- Steven

As will become evident, I disagree with Steven at almost every point. However, I think his point about users not reading documentation is well-taken. Due to hyperlinking, users are very likely to skip past module docstrings. I suggest two (perhaps informal) additions to the documentation policy in PEP 411: (1) Provisional packages should be explicitly described as such in release announcements. I think this is done already, cf. "Improvements in email" in the Release announcement and What's New for Python 3.3.0.[1] However the policy should be explicit and the word "provisional" should be emphasized and linked to PEP 411. (2) Individual APIs in those packages should be documented as provisional. Steven D'Aprano writes:
You exaggerate to the extent that you harm your argument. People make tradeoffs. Some will choose the extreme you describe here, others will take more risk. Your real argument is that the benefits are small because you expect that provisional stdlib packages will not get much if any additional use over PyPI packages. Can't know without trying ... thus, the PEP. Similar considerations apply to your other arguments.
None of these arguments are documented in the PEP, let alone refuted.
They're not real arguments, as stated. If you rephrase them as actual arguments, they turn out to merely be the "half-empty" phrasing of the "half-full" arguments used in favor. Can't know without trying.
I don't think that "provisional" helps end users at all. If anything, it hurts them -- it means more uncertainty and doubt.
True, it *increases* the uncertainty *inside* the stdlib. On the flip side, it decreases uncertainty *overall* by announcing that the core developers *want* this feature in stdlib *now*, and probably the API won't change *at all*, and very probably the API won't change "much".
It's unlikely it will be "gone", it will end up on PyPI.
I don't see how that helps me.
Since you evidently demand absolute certainty, you're right, it doesn't help you with respect to any given provisional package. I'm more flexible, and I greatly value having the core developers evaluate the modules on PyPI (or the modules that have only seen the flourescent lights of Google Labs up to now) for me, and putting their Good Codesmithing Stamp of Approval on some of them more quickly. And if in fact the policy works in the sense that more code gets into the stdlib (non-provisionally) faster than without the PEP, in the long run *you* benefit even with if you adopt a policy of using only non-provisional APIs.
Right. And this PEP is about how to get good code into the stdlib faster. It's a means to that intermediate end, and transitively a means to the real end of helping end users.
What it really comes down to is you're saying you fear you will frequently disagree with the judgment of python-dev with regard to the options provided by "provisional". You want them bound by hard and fast rules. That's fine; everybody has their own requirements. Nevertheless, it's logically possible that as long as you pay attention to "provisional" markings this PEP will be a win for you. I agree with the PEP proponents that "logically possible" can be strengthened to "probably true", but of course YMMV. Footnotes: [1] http://www.python.org/download/releases/3.3.0/ http://docs.python.org/3.3/whatsnew/3.3.html

On 14 September 2013 12:44, Steven D'Aprano <steve@pearwood.info> wrote:
Correct, that's exactly what they should do.
Correct, it doesn't help you until it graduates from provisional status. Less risk averse users may choose to use it sooner (since any removed modules would almost certainly find a home on PyPI in fairly short order)
"We expect that most of the API of most provisional packages will be unchanged at graduation. Withdrawals are expected to be rare." Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14 September 2013 12:44, Steven D'Aprano <steve@pearwood.info> wrote:
Oops, meant to reply to this part, too. PEP 411 is an Informational PEP, not a standards track PEP. It's there to describe the policy, not to make the case for *having* the policy. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 13, 2013 at 06:00:18PM -0700, Ethan Furman wrote:
Sorry, I missed replying to this part. See Serhiy's post: https://mail.python.org/pipermail/python-dev/2013-September/128633.html and suggested patch here: http://bugs.python.org/issue18986 -- Steven

On Sat, 14 Sep 2013 10:49:52 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
I don't think it really matters. From getitem() it's trivial to extract the key alone.
Even after it's used, it will still be bikeshedded: such is how proposals are generally discussed. There really isn't very much to decide here, and whether we only return a key or a (key, value) pair is almost cosmetic: both APIs are reasonably convenient.
Well, __missing__ isn't used very much, I think. People use defaultdict. (note that I'm not even proposing __transform__ right now :-))
Ok, thanks. Regards Antoine.

On 14/09/2013 01:49, Steven D'Aprano wrote:
I think must be missing something. I thought that iterating over the dict would yield the original keys, so if you wanted the original key and value you would write: for key, value in data.items(): print("{}: {}".format(key, value)) and if you wanted the transformed key you would apply the transform function to the key.

On 09/13/2013 06:25 PM, MRAB wrote:
Well, that's certainly how I would do it. ;)
and if you wanted the transformed key you would apply the transform function to the key.
Indeed. The question is: how? It is entirely possible that your function has a TransformDict alone, and no memory of the transform function used to create the dict... If the key transform function were saved directly on the TransformDict instance as, say, .transform_key, then problem solved. -- ~Ethan~

On Fri, Sep 13, 2013 at 06:40:06PM -0700, Ethan Furman wrote:
While I think it is good and proper to have the transformation function available, if you're manually applying the transformation function then IMO chances are good that you've missed the whole point of the transformdict. Think of defaultdict. The purpose of defaultdict is to avoid needing to check whether the key is missing or not. If I suggested that the way to use a defaultdict was to write code like this: value = d[key] if key in d else d.default_factory() I'd be missing the point. Likewise, the point of transformdict is to avoid needing to care about the transformation function 99% of the time. -- Steven

On 09/13/2013 08:33 PM, Steven D'Aprano wrote:
Likewise, the point of transformdict is to avoid needing to care about the transformation function 99% of the time.
No one's arguing against that point. It's the 1% of the time that being able to get the canonical name back is a Good Thing, and the best way to do that is to have the transform_key function easily available. -- ~Ethan~

On 13 September 2013 22:40, Ethan Furman <ethan@stoneleaf.us> wrote:
I hope you are aware that this pattern does not help when one wants _one_ canonical key having a non-canonical one, besides having to linearly walk through all keys and check the "__transform__" to each one. I mean - given no function to retrieve the canonical key, one would have to resort to: my_key = data.__transform__(given_key) for key, value in data.items(): if data.__transform__(key) == my_key: .... WHich would defeat not only the purpose of a Transform dict, but the purpose of a dict alltogether. (of course, the one obvious way to do it would be to have the original key stored along the value - which is just a bit less silly than the example above) OTOH, having a `get_canonical_key` method or similar seens trivial enough- if the only provided method retrieves the value as well, it would not be that bad.

On 09/13/2013 09:53 PM, Joao S. O. Bueno wrote:
True, but I was thinking Steve was talking about printing the entire dict, in which case that is, indeed, how I would do it.
Which is exactly why I, and others, would like to have the transform function easily available. Besides being able to use it to get a canonical key, one could use it to get the function itself. Yay, introspection! -- ~Ethan~

On Fri, 13 Sep 2013 21:59:11 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Well, no, you misunderstand :) The transform function takes an original key (perhaps "canonical") and returns the transformed key, it can't do the reverse which is what getitem() does. i.e.:
What getitem() does is make the surjection bijective by restricting its input domain to the set of stored keys. Regards Antoine.

14.09.13 20:41, Antoine Pitrou написав(ла):
There is one reason -- serialization. For example pickle saves the transform function as an argument for TransformDict constructor. Repr exposes the transform function too (in evaluable representation). Other serializers need the transform function too. My implementations expose it as public property (transform), your -- as private attribute (_transform). Or perhaps I misunderstood you?

15.09.13 00:58, Antoine Pitrou написав(ла):
The same functools has decorating_function() with the user_function argument. Also find_function, search_function, create_function, isfunction, isgeneratorfunction, from_function, has_function, copy_function, pickle_function, register_function, emit_function, and print_function. This is not counting tests, IDLE and private names.

On Sat, Sep 14, 2013 at 02:25:42AM +0100, MRAB wrote:
On 14/09/2013 01:49, Steven D'Aprano wrote:
You're missing that I'm not iterating over the entire dict, just some subset ("data") that I got from elsewhere. In real code, of course I would have to handle missing keys. My gut feeling is that normally I would want both the canonical key and the value, not just any valid key: # not this for key in data: value = d[key] print("{}: {}".format(key, value)) # but this for key in data: key, value = d.getitem(key) print("{}: {}".format(key, value)) but that's just a gut feeling, and I daresay that it would depend on the application.
and if you wanted the transformed key you would apply the transform function to the key.
The caller may not know or care what the transformation function is. The caller may only care about one or two things: - does this key match a key in the dict? - what is the canonical version of this key? For example, in a case-preserving file system, the canonical version of a file name is whatever the user first typed when they created the file: create file "SoMe FiLe" so the canonical version is, "SoMe FiLe". But any of these may be expected to match: some file SOME FILE Some File some%20file etc. It is *not* simply the case that the canonical version is "some file". Certainly that's not how case-preserving file systems work. -- Steven

On 09/13/2013 08:18 PM, Steven D'Aprano wrote:
You're missing that I'm not iterating over the entire dict, just some subset ("data") that I got from elsewhere.
Ah, okay. Between you and Antoine I am convinced that .getitem() is a good thing. So have that and .transform_key and we're golden! :) -- ~Ethan~

On Fri, 13 Sep 2013 20:40:58 +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
This motivation would be stronger if the last phrase was something like "where the keys are textual strings whose case is specified to be ignored on receipt but by either specification or custom is to be preserved or non-trivially canonicalized when retransmitted."
(it can be said that the pattern *projects* keys from the user-visible set onto the internal lookup set, hence this PEP's title)
Not clear what "projects" has to do with the PEP title.
You don't mention the alternate constructor discussion, or the rationale for following the defaultdict pattern (ie: following the established pattern :) --David

On Fri, 13 Sep 2013 16:09:10 -0400 "R. David Murray" <rdmurray@bitdance.com> wrote:
Thanks, will add.
The PEP was originally titled "a key-projecting dictionary" and I've changed it as it was too obscure. I've now also removed the obsolete part of the sentence above :-)
Ok, will do that too. Regards Antoine.

13.09.13 21:40, Antoine Pitrou написав(ла):
Alternative proposals and questions ===================================
Yet one alternative proposal is to add the dict.__transform__() method. Actually this not contradict TransformDict, but generalize it. TransformDict will be just handly interface to __transform__() as defaultdict to __missing__(). It provides only constructor, repr and pickling. My second patch for http://bugs.python.org/issue18986 shows that as implementation of dict.__transform__(), so implementation of TransformDict using dict.__transform__() are simple enough.

13.09.13 23:21, Antoine Pitrou написав(ла):
Both. On one side, with this proposition TransformDict itself doesn't deserve PEP. It will be trivial and obvious thing. Not very important. On other side, TransformDict can be implemented even without dict.__transform__(), on pure Python.

On Sat, 14 Sep 2013 14:33:56 +0900 Larry Hastings <larry@hastings.org> wrote:
Well, a TransformSet is like a normal dict, you just need to call the transformation function yourself when inserting the keys. i.e.: d = TransformSet(str.lower) d.add('Foo') is the same conceptually as: d = {} d['Foo'.lower()] = 'Foo' d['foo'] # gets the original key Regards Antoine.

On 09/14/2013 07:30 PM, Antoine Pitrou wrote:
s/normal dict/normal set/ But, then, a TransformDict is like a normal dict, you just need to call the transformation function yourself when inserting the keys. Yet a TransformDict is a useful enough concept that it is being proposed for addition; I was wondering if a TransformSet would be useful, too. But I suppose we should take baby steps here. //arry/

15.09.13 14:23, Antoine Pitrou написав(ла):
A TransformDict is like a normal dict, you just need to call the transformation function yourself when inserting the keys and insert a pair of (orig_key, value), or is like two normal dicts, you just need to call the transformation function yourself when inserting the keys and update parallel mapping from transformed keys to original keys.

On 16 September 2013 00:45, Serhiy Storchaka <storchaka@gmail.com> wrote:
I think it's best to hold off a TransformSet, since you can trivially emulate it by setting the values in a TransformDict to None (that's how Python itself lived without a set type for so long: people just used dicts with all the values set to None). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, 14 Sep 2013 12:31:36 -0700 Guido van Rossum <guido@python.org> wrote:
The discussion has settled down for the most part, and there is a consensus amongst participants on the desireability of the feature and its characteristics. The implementation is straightforward pure Python (Serhiy's C proposal should probably be a separate enhancement request on the tracker). I think the proposal will soon be ready for a pronouncement - unless other concrete questions and suggestions are recorded. Thanks Antoine.

On Sep 22, 2013, at 6:16 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
Are we close to asking for pronouncement?
When you're ready, let me know. In the meantime, I conducting usability tests on students in Python classes and researching how well it substitutes for existing solutions for case insensitive dictionaries (the primary use case) and for other existing cases such as dictionaries with unicode normalized keys. If you want to participate in the research, I could also use help looking at what other languages do. Python is not the first language with mappings or to encounter use cases for transforming keys prior to insertion and lookup. I would like to find out what work has already been done on this problem. Another consideration is whether the problem is more general that just dictionaries. Would you want similar functionality in all mapping-like objects (i.e. a persistent dictionaries, os.environ, etc)? Would you want similar functionality for other services (i.e. case-insensitive filenames or other homomorphisms). You can also add to the discussion by trying out your own usability tests on people who haven't been exposed to this thread or the pep. My early results indicate that the API still needs work. * When shown code that uses a TransformDict, students don't seem to be able to deduce what the code does just from the context (this contrasts with something like OrderedDict and Counter where the name says what it does). * When given a description of the mechanics of a TransformDict, they don't seem to be able to figure-out what you would do with it without being given an example. * When given a example of using a TransformDict, they understand the example but don't seem to be able to come-up with other examples other than the one they were just shown. And when shown multiple examples, they can't think of other use cases where they've ever needed this in their own code. * This contrasts with the results when I show something less general like a CaseInsensitiveDict. People seem to get that right away. As you might expect, the generalized solution is harder to wrap your head around than a specific solution with a clear name. * One student asked, "why give regular dicts a key-function like sorted(), min() and max()?" I didn't have a good answer, but I haven't yet had time to read this whole thread. * Another issue is that we're accumulating too many dictionary variants and that is making it difficult to differentiate and choose between them. I haven't found anyone (even in advanced classes with very experienced pythonistas) would knew about all the variations: dict, defaultdict, Mapping, MutableMapping, mapping views, OrderedDict, Counter, ChainMap, andTransformDict. David Beazley on twitter recently proposed that we add a MinDict and MaxDict. There seems to be no shortage of ideas of things that can be done with dictionaries. Besides choosing among the dict variants, there is also confusion about other mapping topics such as 1) when to subclass from dict rather than inherit from MutableMapping, 2) the difference between defaultdict(int) and Counter's use of __missing__ to return zero, and 3) it seems that many experienced users can't even name all the existing methods on dictionaries (they forget clear(), copy(), pop(), popitem(), setdefault(), update() and the fromkeys() classmethod). Overall, my impression at this point is that key transformations are useful, but I'm not sure how to incorporate them without taking Python further away from being a language "that just fits in your head." Raymond

2013/10/4 Raymond Hettinger <raymond.hettinger@gmail.com>:
Ok, but none of these classes address use cases described of the PEP 455. If it became hard to choose the best container for an use case, it's maybe a documentation issue. The PEP 455 contains a long list of existing implementations, so it means that these use cases are common (even if the Python stdlib according to the PEP). It's a good thing that Python proposes a standard implementation (efficient, well tested, documented, etc.) to answer to these use cases. I'm not convinced by your usability test. The problem is maybe the name, TransformDict. We may find a more explicit name, like TranformKeyDict or NormalizedKeyMapping. Or we can use names of the Transformers movies: OptimusPrimeDict, BumblebeeMapping, JazzDictionary, etc. (If we cannot find a better name, we may add more specialized classes: KeyInsensitiveDict and IdentiyDict. But I like the idea of using my own "transform" function.) Victor

On Oct 4, 2013, at 2:06 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
I'm not convinced by your usability test.
You're not the one who needs to be convinced ;-) Please do conduct your own API tests and report back. This is necessary for a new class like TransformDict that was constructed from scratch and proposed for direct admission to the standard library. This contrasts with other tools like OrderedDict, ChainMap, and namedtuple which started their lives outside the standard library where we we able observe their fitness for real problems being solved by real users. None of my consulting client's have anything like a general purpose transforming dict in their utility modules, so we lack the real world experience that informed the design of the other tools in the collections module. To make up for that lack of information, we need to put it in front of users as well as do research into how other languages have tackled the use cases. In short, we need to know whether the API will make sense to people, whether their code will be more readable with a TransformDict, and whether the zoo of dict variants should continue to grow. Right now, I don't know those things. All I have to go on is that I personally think the TransformDict is a cool idea. However, that alone isn't sufficient for accepting the PEP. Raymond “… in order to get things merged you need to solve not only just your own problem but also realize that the world is bigger than your company and try to solve things in a way where it makes sense for other people, even if primarily it is for your own situation.” -- Linus Torvalds http://www.extremeta.com/2013/09/linus-torvalds-said-linuxcon-kernel-develop...

2013/10/4 Raymond Hettinger <raymond.hettinger@gmail.com <javascript:;>>:
Please do conduct your own API tests and report back.
I don't understand why there is a need to evaluate a "new" API: TransformDict has the same API than dict. The only differences are that the constructor takes an extra mandatory parameter and there is a new getitem() method. All other methods are the same: len(dict), key in dict, dict.get(key), dict[key]=value, etc.
The class is not "new": it is based on the API of dict, and existing specialized implementations already used in the wild. But I didn't compare TransformDict to existing specialized implementations... Antoine: do you know if your API is different? Maybe in some minor(?) details like storing the first key or last key.
Why do you say that TransformDict has no real use case, whereas similar containers are already used since many years in the Python standard library? Extract of the PEP: "Several modules in the standard library use identity lookups for object memoization, for example pickle, json, copy, cProfile, doctest and _threading_local." I didn't check this whole list, but it looks like some modules can directly use TransformDict(id), see for example the copy module. And "case insenstive" dictionaries are also used in various projects according to the PEP. Or do you mean that there are too few projects or they are not popular enough? In Python, email.Message uses a case-insensitive container preserving the original case, except that it does also transform the value. email.Message is used in http.client to parse HTTP headers. Twisted has its own InsensitiveDict type for that (which does not transform the value). http://twistedmatrix.com/documents/current/api/twisted.python.util.Insensiti...
“… in order to get things merged you need to solve not only just your own problem but also realize that the world is bigger than your company and
Maybe Antoine can provide us a patch on the Python standard library to reuse TransformDict(id) where it is possible. So we can compare the code before/after to see if it's more readable or not. try
to solve things in a way where it makes sense for other people, even if primarily it is for your own situation.” -- Linus Torvalds
http://www.extremeta.com/2013/09/linus-torvalds-said-linuxcon-kernel-develop... Existing implementations are specific (id() or str.lower()), whereas Antoine's container is generic (transform function). Or do you mean that it's not enough generic? It should do something on the value? Victor

06.10.13 00:08, Victor Stinner написав(ла):
Unfortunately the pickle and the copy modules can't use TransformDict(id) because they expose their mappings (with integer id keys) in public API. At least until Python 4.0. There are no so much use cases for IdentityDict in stdlib now, and even less for CaseInsensityDict. And a workaround is simple. I don't know about usage of TransformDict with other transfom function.

On Fri, Oct 04, 2013 at 11:06:15PM +0200, Victor Stinner wrote:
-1 on a plethora of specialised dicts. I do think that a TransformDict seems useful, and might even *be* useful, but would not like to see a whole pile of specialised dicts added to the std lib. I wonder though, are we going about this the wrong way? Since there is apparently disagreement about TransformDict, that suggests that perhaps we need more concrete experience with the basic idea before graduating to a concrete class in the std lib. Perhaps we should follow the example of dict, __missing__ and defaultdict. The dict class could do something like this on key access: if type(self) is not dict: # This only applies to subclasses, not dict itself. try: transform = type(self).__transform__ except AttributeError: pass else: key = transform(key) # now use the key as usual Am I barking up the wrong tree? Would this slow down dict access too much? -- Steven

On 10/07/2013 02:24 PM, Steven D'Aprano wrote:
Considering that __transform__ would usually not exist, and triggered exceptions are costly, I think it would. From the docs[1]: (10) If a subclass of dict defines a method __missing__, if the key k is not present, the a[k] operation calls that method with the key k as argument. The a[k] operation then returns or raises whatever is returned or raised by the __missing__(k) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__ is not defined, KeyError is raised. __missing__ must be a method; it cannot be an instance variable. For an example, see collections.defaultdict. New in version 2.5. So something more like: transform = getattr(self, '__transform__', None) if transform is not None: key = transform(key) ... A key difference (pun unavoidable ;) between __missing__ and __transform__ is that __missing__ is only called when a key is not found, while __transform__ needs to be called /every/ time a key is looked up: d[k] d.get(k) d.has_key(k) d.fromkeys(...) d.setdefault(...) k in d -- ~Ethan~

On Mon, Oct 07, 2013 at 02:55:44PM -0700, Ethan Furman wrote:
I fear you are right, particularly for subclasses. dict itself would only have the cost of testing whether the instance is an actual dict, which I presume is cheap. Still, given enough "cheap" tests, the overall performance hit could be significant. [...]
One minor point, being a dunder method, it should be grabbed from the class itself, not the instance: getattr(type(self), ...)
Yes. -- Steven

On 8 Oct 2013 07:26, "Steven D'Aprano" <steve@pearwood.info> wrote:
The problem is doing this in a way that keeps a strong reference to the original key (and produces that when iterating) while doing the lookup based on the transformed keys. That said, with the current plan to lower the barrier to entry for PyPI dependencies (I should have the 3.4 only ensurepip proposal written up some time this week), I think it makes sense to let this one bake on PyPI for a while. I think there *is* a potentially worthwhile generalisation here, but I'm far from sure "is-a-dict" is the right data model - for composability reasons, it feels like a "has-a" relationship with an underlying data store may make more sense. (If performance is critical, you're going to write a dedicated type anyway, so composability and simplicity strike me as more important criteria at this point). Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On Tue, 8 Oct 2013 08:31:46 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
"the current plan to lower the barrier to entry for PyPI" sounds a lot like the obsession du jour to me :-) It's not like "ensurepip" makes it cheaper / more attractive to add dependencies. It just provides a better experience for those who *want* to use pip (and would otherwise have installed it using an explicit download).
It doesn't work. Your "underlying mapping" can show too much variation for the wrapper/proxy to have sane semantics. For example, how do you combine with a defaultdict or a WeakKeyDictionary, knowing that the wrapper/proxy has to have its own internal mapping as well?
A dedicated CaseInsensitiveDict won't be much faster than TransformDict(str.casefold). Regards Antoine.

On 8 Oct 2013 18:36, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
It's OK if the key transforming API has to constrain the behaviour of the underlying mapping or require an appropriately designed transform function to handle more esoteric containers. Either would still be better than categorically *disallowing* composition to the point where you can't even compose it with OrderedDict. ChainMap doesn't compose sensibly with arbitrary mappings like defaultdict, but composition is still the right design choice because it works well with *most* custom mappings. It's not that I think this is necessarily a *bad* idea (although the composability problem gives me grave doubts), it's that I think it's not an *urgent* idea, so why rush to review and include it in the weeks remaining before the beta, when the option of publishing it as a recipe or a PyPI module remains available? Enums eventually made it in because we wanted to *use* them to dramatically improve error messages from other stdlib modules. What's the comparable end user payoff for including TransformDict in 3.4 rather than 3.5 (or never)? Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Le Tue, 8 Oct 2013 22:12:02 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
Well, you could ask the same question about OrderedDict, defaultdict or Weak*Dictionary since neither of them use composition :-)
ChainMap is easy to compose since it doesn't have to keep any data-driven internal state.
It's just that I disagree we're rushing anything. The feature is fairly simple, many people have already had a need for something like that. (and amongst those people absolutely *zero* have said the design of the feature proposal is inadequate) Regards Antoine.

On 8 Oct 2013 22:31, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
We *did* ask the same question about those (except the Weak* variants, simply due to age). Each time, the point that each new dict variant would be used to justify yet *more* variants was a cause for concern. defaultdict made it through because it's just a convenience API for the underlying "key missing" protocol, while OrderedDict spent time maturing outside the standard library first.
Yet composing ChainMap with defaultdict still breaks, because it's a conceptually incoherent thing to do. "Composition doesn't work with some mappings" isn't an adequate answer to the criticism that a composition based design could work with more mappings than just the builtin dict.
Except the one who wanted to combine it with OrderedDict. Users won't be able to combine it with ChainMap either. The concrete container vs container-wrapper design decision is a fundamental one and I don't believe the current PEP adequately makes the case in favour of the former. Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 9 Oct 2013 01:07, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
Just that the comment I replied to is getting very close to the argument "we have so many non-composable mapping variants already, what's the harm in adding one more?". I believe potentially enabling that argument in the future was cited as a downside for all of defaultdict, OrderedDict and Counter.

Good evening, On Fri, 4 Oct 2013 13:38:05 -0700 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
You can also add to the discussion by trying out your own usability tests on people who haven't been exposed to this thread or the pep.
I think "usability tests" should be conducted on people who actually have a need for the API. Otherwise they simply don't make sense: if you don't need an API, then you don't have to learn / understand it either. As an example, if you conduct random usability tests about "yield from" (PEP 380, accepted) or single-dispatch generic functions (PEP 443, accepted), you'll probably get a negative outcome, especially on students. Or if you conduct usability tests about the ssl module on someone who's never done any network programming, you'll get the similar kind of negative results.
Well, the documentation is the place where we give examples.
Is it any different for e.g. defaultdict? Because the mechanics are exactly the same: a generic construct which you can specialize for various use cases.
Yet the generic solution is applicable to far many cases than the specialized one. I'm not against adding a CaseInsensitiveDict, but that would be a rather bizarre thing to do given we can add a generic construct that's far more powerful, and not significantly more difficult.
:-) The key answer is: when you want to retain the original key.
It shouldn't be difficult, actually, because it doesn't make sense to "choose" at all. The use cases for OrderedDict, Counter, TransformDict and defaultdict are completely different.
Is that actually a problem?
The language fits in your head, but the stdlib doesn't. I don't think it has done so for ages :-) I'm not proposing TransformDict as a builtin, though. Regards Antoine.

On Oct 4, 2013, at 2:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
You're right. Students don't make the best test subjects. It might be nice to present this at a Python meet-up or somesuch. Or some people on this list can present it at work to see how their colleagues do with it. Also, it might be nice to get feedback from existing users of IdentityDicts or CaseInsensitiveDicts to see if they are bothered by the implementation having two underlying dictionaries. Raymond

In article <C4C036B6-130C-4718-BEB1-A7C9230082F4@gmail.com>, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I agree. I personally think being able to transform keys would be much more useful as a property of existing dictionaries. I often use case-insensitive keys. But I use them with dict and OrderedDict (and probably ought to use defaultdict, as well). TransformDict is neat, but I'd personally be happier seeing this as a 3rd party library for now. -- Russell

On Oct 28, 2013, at 1:16 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
so what is the status of the PEP 455 (TransformDict)?
I'm giving a thorough evaluation of the proposal and am devoting chunks of time each weekend to reviewing the email threads, the links provided in the PEPs, looking at how well the TD fits in existing code. I'm giving this work priority over my own list of things to add to 3.4 (most of which will now have to wait until 3.5). This week, I'm teaching a five-day intermediate python class to highly experienced network engineers in Denver. We'll do some exercises using the TD and evaluate the results against alternative approaches. Here are some preliminary notes (in no particular order): * A first reading of the python-dev threads suggests that the two-dict TD implementation seems better suited to implementing a case-folding-case-preserving dictionary and is far less well suited for a case-folding dict or an identity dict. * There are interesting differences between the proposed TD and the CaseInsensitiveDict implemented in Kenneth Reitz's HTTP requests library. The latter keeps the last key added rather than the first. It also has a cleaner implementation and the API is a bit nicer (no getitem() method). * The "originals" dict maps a transformed key back to its first saved original value. An alternative would be to map back to a set of original values or a list of original values. * A possible use case is for a unicode normalizing dictionary where 'L' + chr(111) + chr(776) + 'wis' would match 'L' + chr(246) + 'wis'. * The implementation looks rough at this point, but that is easily fixed-up. I'll leave some specific suggestions on the tracker (getting it to accept a list of tuples in the constructor, a recursive repr, renaming the getitem() method, deprivatizing the attributes, getting it to work with __missing__, etc). * Having two-mappings-in-one seems to be the most awkward part of the design and wouldn't be needed for the two strongest use cases, a case-insensitive-but-not-case-preserving dict and an identity dict. * In http://stackoverflow.com/questions/13230414, the OP wants a CI dict but doesn't need the case preserving aspect. The OP is provided with a regular dictionary containing mixed case keys and needs to compare a list of potential matches of unknown case. Copying the dict to a case-folding TD wouldn't provide any incremental advantage over building a regular dict with lower case keys. * In http://www.gossamer-threads.com/lists/python/python/209527, the OP wanted an ID comparison for a symbolic calculation. The objects already had an eq function and he wanted to temporarily bypass that in a symbol lookup. Essentially he needed a dictionary that would allow the use of an alternative equality function. A TD would work here but there would be no need for the key preserving feature. There doesn't seem to be any advantage over using a regular dict that directly stores the id() as the key. * In https://mail.python.org/pipermail/python-ideas/2010-May/007235.html, the OP wants a highly optimized identity dictionary that doesn't call an expensive id() function. The proposed TD doesn't fulfill this request -- it would be far from being optimized and would call id() on every store and every lookup. * In http://msdn.microsoft.com/en-us/library/xfhwa508.aspx, the API describes a dict with an alternative equality comparison. This is a different design than the TD and is for a world that is somewhat different from Python. In that dict, eq and hash are specified at the dict level rather than object level (in Python, the key objects define their own __eq__ and __hash__ rather than having the user attach the functions directly to the dictionary). * In http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html, the IdentityHashMap is described as being for rare use cases such as topology-preserving object graph transformations likes serialization or deep-copying. Looking at Python's own code for copy.deepcopy(), the TD would not be a good replacement for the existing code (more memory intensive, slower, no use for the originals dict, and no use for most of the TD functionality). It appears that it is simpler and faster to store and lookup d[id(obj)] than to use a TD. * If this were being modeled in a database, we would have one table with a many-to-one mapping of original keys to transformed keys and another table with a transformed key as the primary key in a table of key-value pairs. This suggests two lookup patterns original->tranformed->value and transformed->all_originals. * The Apache case insensitive dict documentation includes these thoughts: "This map will violate the detail of various Map and map view contracts. As a general rule, don't compare this map to other maps. In particular, you can't use decorators like ListOrderedMap on it, which silently assume that these contracts are fulfilled. --- Note that CaseInsensitiveMap is not synchronized and is not thread-safetiveMap.html --- This map will violate the detail of various Map and map view contracts. As a general rule, don't compare this map to other maps. In particular, you can't use decorators like ListOrderedMap on it, which silently assume that these contracts are fulfilled." * Using ACK to search through Django, it looks like there over a half-dozen case-insensitive lookups with are implemented using d[k.lower()] and wouldn't be better-off using the proposed TD. The notes above are rough and I still have more work to do: * close reading of the remaining links in the PEP * another round of user testing this week (this time with far more experienced engineers) * review the long python-dev thread in more detail * put detailed code suggestions on the tracker All of this will take some time. I understand that the 3.4 feature deadline is looming, but I would like to take the time to thoroughly think this through and make a good decision. If I had to choose right now, a safe choice would be to focus on the primary use case and implement a clean CaseInsensitiveDict without the double-dict first-saved case-preserving feature. That said, I find the TD to be fascinating and there's more work to do before making a decision. Hopefully, this post will make the thought process more transparent. Cheers, Raymond

On Wed, 30 Oct 2013 01:12:03 -0600, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Please be aware that the PEP author's motivation in submitting the PEP was to have a case insensitive, case *preserving* dict. The generalization serves to make the new datatype more useful, but if the end result doesn't satisfy the original use case of the author, I won't be surprised if he has no motivation to work on it further :). --David

It strikes me that there could be an alternative approach to some of the use cases discussed here. Instead of a new type of dictionary, the case-insensitivity problem could be solved with something akin to a * CaseInsensitiveString* class used for keys within a standard dictionary. This would be very similar to a normal string except with comparison and hashing. It would mean that CaseInsensitiveString("Foo") is considered equal to CaseInsensitiveString("foo") and allow code such as the following:
This would obviously also be usable in other places where case-insensitive strings are required. Just my two pence/cents/other minor currency units. Nigel On 30 October 2013 14:18, Ethan Furman <ethan@stoneleaf.us> wrote:

On 10/30/2013 09:34 AM, Nigel Small wrote:
It strikes me that there could be an alternative approach to some of the use cases discussed here. Instead of a new type of dictionary, the case-insensitivity problem could be solved with something akin to a *CaseInsensitiveString* class [...]
The nice thing about the TransformDict is it is usable for much more than simple case-insensitivity. -- ~Ethan~

True, but I could similarly argue that the nice thing about CaseInsensitiveString is it is usable for much more than dictionary keys - it just depends on your point of view. There would be nothing stopping other types of dictionary key transformation being covered by other key data types in a similar way, I'm simply trying to raise the question of where the genericity could sit: in the dictionary or in the key. Nigel On 30 October 2013 17:04, Ethan Furman <ethan@stoneleaf.us> wrote:

On Wed, 30 Oct 2013 16:34:33 +0000 Nigel Small <nigel@nigelsmall.com> wrote:
And how does a case-insensitive string compare with a normal (case-sensitive) string? This is a can of worms. (if you answer, please don't answer in this thread but open a separate one for case-insensitive strings, thanks) Regards Antoine.

And how does a case-insensitive string compare with a normal (case-sensitive) string? This is a can of worms.
I was wondering this myself. I suspect it would depend which string is on the left hand side of the comparison operator, yes? Can of worms, indeed. implicit-insensitve-i-n-ly, y'rs, Skip

Hi Raymond, On Wed, 30 Oct 2013 01:12:03 -0600 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Thanks for the thorough status report.
First-vs-last has already been discussed in the previous thread. My initial hunch was to keep the last key, but other people made the point that first was both more compliant (with current dict behaviour) and more useful (since you can override it by deleting and then reinserting the entry). Regards Antoine.

Sorry I missed the original discussion, but isn't this a simple case of putting a decorator around the getitem method (to transform the input key) and a single line in the body of the setitem method, making this very easy adaptation of the existing dict class? Mark

On Mon, Oct 07, 2013 at 06:17:09PM -0700, Mark Janssen wrote:
Not really. We can try what you suggest to implement a case insensitive dict (decorator is not needed): py> class CaseInsensitiveDict(dict): ... def __getitem__(self, key): ... key = key.casefold() # use .lower() before Python 3.3 ... return super().__getitem__(key) ... def __setitem__(self, key, value): ... key = key.casefold() ... super().__setitem__(key, value) ... py> d = CaseInsensitiveDict() py> d['X'] = 42 py> d {'x': 42} Well, that's not exactly what I was hoping for... I was hoping that the dict would preserve case, rather than just convert it. But that's only the start of the problem: py> d['X'] # this works 42 py> d.pop('X') # expecting 42 Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'X' So no, it isn't just a matter of a trivial wrapper around __getitem__, __setitem__ and __delitem__. Check the bug tracker for more detail: http://bugs.python.org/issue18986 -- Steven

On Fri, Sep 13, 2013 at 10:40 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
Hello. Overall I think that's a great idea. Here are some questions on it though. I'm sorry if some of these have already been discussed in some other thread. 1. Thread safety. PEP doesn't mention anything about thread safety while the implementation proposed in the tracker is (very) not thread-safe. I think, PEP should mention that this class have no guarantees. 2. Extra dict. There should be a way to avoid creation of the second dict when there is no need to store original keys. For example, email.message module doesn't store original headers as they are not needed. The same applies to web frameworks parsing HTTP headers or WSGI environment. I'm sure I had another one. I'll send it once I remember. -- Kind regards, Yuriy.

On Tue, Oct 8, 2013 at 10:17 PM, MRAB <python@mrabarnett.plus.com> wrote:
If you don't need the original key, then you might as well just use a transform function with a dict.
As I understood, storing original keys is not the purpose of TransformDict, allowing hashing on something other then provided key itself is. This part presents interest for the cases I mentioned. -- Kind regards, Yuriy.

On Tue, 08 Oct 2013 22:02:43 +0400, Yuriy Taraday <yorik.sar@gmail.com> wrote:
This is not true. email.message *very carefully* preserves the original header name, case and all [*], that's part of its mandate (faithfully reproducing the original parsed message). That said, email.message can't use transformdict, since email.message needs a list-with-case-insensitive-keyed-lookup, not a dict, because it also preserves the original *order* of the headers. And it couldn't use an orderedtransformdict, either, since it also has to preserve *duplicate* headers. --David [*] currently it loses whitespace information in certain cases, but the new header parser/folder is supposed to fix that...or will once I fix a few more corner cases :)

13.09.13 21:40, Antoine Pitrou написав(ла):
Please use str.casefold in examples.
>>> d = TransformDict(str.lower, [('Foo': 1)], Bar=2)
{'Foo': 1} or [('Foo', 1)].
Except lightweight IdentityDict which can be implemented more efficient than TransformDict(id). It doesn't need in calling the transform function, computing the hash of transformed key, comparing keys. But perhaps in many cases TransformDict(id) is enough.
Also copy, json, cProfile, doctest and _threading_local.

On Fri, 13 Sep 2013 22:31:02 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
Ok, thanks.
That's true. But it's only important if TransformDict is the bottleneck. I doubt the memoizing dictionary is a bottleneck in e.g. the pure Python implementation of pickle or json.
Thanks, will add them. Regards Antoine.

On Fri, 13 Sep 2013 16:54:01 -0300 "Joao S. O. Bueno" <jsbueno@python.org.br> wrote:
I see the PEP does not contemplate a way to retrieve the original key - like we've talked about somewhere along the thread.
Indeed. If that's important I can add it. I was hoping to keep very close to the MutableMapping API, to make the PEP as few controversial as possible. Regards Antoine.

On Fri, 13 Sep 2013 23:18:39 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
Ok, I have a better (IMO) proposal: >>> d = TransformDict(str.casefold, {'Foo': 1}) >>> d.getitem('foo') ('Foo', 1) >>> d.getitem('bar') Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'bar' Regards Antoine.

On Fri, Sep 13, 2013 at 11:45:16PM +0200, Antoine Pitrou wrote:
Is it more common to want both the canonical key and value at the same time, or to just want the canonical key? My gut feeling is that I'm likely to have code like this: d = TransformDict(...) for key in data: key = d.get_canonical(key) value = d[key] print("{}: {}".format(key, value)) in which case having a single call to return both will be great: for key in data: key, value = d.getitem(key) print("{}: {}".format(key, value)) but I'm really not sure. Maybe Ethan is right. I think these sorts of associated questions are why some people (Raymond, Nick) want to see the proposal live outside of the standard library for a while first. The general idea is great, but we're going to bike shed the API without having much idea of how it will actually be used. So, my suggestion is this: - Let's add __transform__ to dicts for 3.4, similar to __missing__, and see how people end up using it in the real world. - Then, in 3.5, we can make a much better informed decision about the best API for a TransformedDict front-end to __transform__. +1 on __transform__ method on dicts. +0 on TransformedDict in 3.4 +1 on waiting for 3.5 based on experience on using __transform__. -- Steven

On 09/13/2013 05:49 PM, Steven D'Aprano wrote:
+1 on __transform__ method on dicts.
What would __transform__ do? Just canonicalize the key, or also save the original key? How much front-end work will each user have to do to actually use this new magic method to good effect? Personally, if there's a bunch of push-back against just adding TransformDict directly, why don't we make it provisional? I thought that was what provisional was for (meaning: we're going to add it, PyPI is not really appropriate, there may be some API changes). -- ~Ethan~

On Fri, Sep 13, 2013 at 06:00:18PM -0700, Ethan Furman wrote:
Not according to PEP 411. It implies that only modules/packages can be provisional, not individual functions, and states that "most packages" are expected to be provisional. So either PEP 411 doesn't apply to TransformDict at all, or it applies by default. The PEP doesn't say. http://www.python.org/dev/peps/pep-0411/ Everything below the line is about PEP 411, not TransformDict. If you don't care about PEP 411, you can stop reading now. ========================== Personally, I think it's a poor PEP. It doesn't document opposition to the idea, and if I recall the discussion at the time correctly, there was plenty of opposition. - Since people cannot rely on provisional features still being available in the future, if they care the slightest about forward compatibility, they dare not use them. - If people do use them, and we care about backwards compatibility, we dare not remove provisional packages without going through the same deprecation process as for ordinary packages. - It relies on people reading the documentation and noticing that a package is marked "provisional". Like that's going to happen. None of these arguments are documented in the PEP, let alone refuted. Even if the decision to approve the PEP ends up being vindicated, I think it was poor form for the PEP to ignore arguments against. I don't think that "provisional" helps end users at all. If anything, it hurts them -- it means more uncertainty and doubt. Packages may languish in provisional status indefinitely. Speaking as an end user, I used to know that once a feature hit the std lib, it was stable and wouldn't be removed without going through a lengthy period of deprecation (except under truly remarkable circumstances). But for provisional packages, that promise doesn't apply, and although PEP 411 says that packages won't be gratuitously removed, I have to assume that any provisional package in version 3.x could be gone without notice or warning in 3.x+1. I don't see how that helps me. It just means that for some indefinite period, there's a package in the std lib doing exactly what I want that I dare not use in production software because there are no stability guarantees. The PEP has a lengthy section that claims to be about how it will help end users. It actually isn't -- it is about how inclusion in the standard library helps end users. Not surprisingly, there's nothing about why reserving the right to rip out a package without warning is good for end uers, since it isn't. End of rant. -- Steven

As will become evident, I disagree with Steven at almost every point. However, I think his point about users not reading documentation is well-taken. Due to hyperlinking, users are very likely to skip past module docstrings. I suggest two (perhaps informal) additions to the documentation policy in PEP 411: (1) Provisional packages should be explicitly described as such in release announcements. I think this is done already, cf. "Improvements in email" in the Release announcement and What's New for Python 3.3.0.[1] However the policy should be explicit and the word "provisional" should be emphasized and linked to PEP 411. (2) Individual APIs in those packages should be documented as provisional. Steven D'Aprano writes:
You exaggerate to the extent that you harm your argument. People make tradeoffs. Some will choose the extreme you describe here, others will take more risk. Your real argument is that the benefits are small because you expect that provisional stdlib packages will not get much if any additional use over PyPI packages. Can't know without trying ... thus, the PEP. Similar considerations apply to your other arguments.
None of these arguments are documented in the PEP, let alone refuted.
They're not real arguments, as stated. If you rephrase them as actual arguments, they turn out to merely be the "half-empty" phrasing of the "half-full" arguments used in favor. Can't know without trying.
I don't think that "provisional" helps end users at all. If anything, it hurts them -- it means more uncertainty and doubt.
True, it *increases* the uncertainty *inside* the stdlib. On the flip side, it decreases uncertainty *overall* by announcing that the core developers *want* this feature in stdlib *now*, and probably the API won't change *at all*, and very probably the API won't change "much".
It's unlikely it will be "gone", it will end up on PyPI.
I don't see how that helps me.
Since you evidently demand absolute certainty, you're right, it doesn't help you with respect to any given provisional package. I'm more flexible, and I greatly value having the core developers evaluate the modules on PyPI (or the modules that have only seen the flourescent lights of Google Labs up to now) for me, and putting their Good Codesmithing Stamp of Approval on some of them more quickly. And if in fact the policy works in the sense that more code gets into the stdlib (non-provisionally) faster than without the PEP, in the long run *you* benefit even with if you adopt a policy of using only non-provisional APIs.
Right. And this PEP is about how to get good code into the stdlib faster. It's a means to that intermediate end, and transitively a means to the real end of helping end users.
What it really comes down to is you're saying you fear you will frequently disagree with the judgment of python-dev with regard to the options provided by "provisional". You want them bound by hard and fast rules. That's fine; everybody has their own requirements. Nevertheless, it's logically possible that as long as you pay attention to "provisional" markings this PEP will be a win for you. I agree with the PEP proponents that "logically possible" can be strengthened to "probably true", but of course YMMV. Footnotes: [1] http://www.python.org/download/releases/3.3.0/ http://docs.python.org/3.3/whatsnew/3.3.html

On 14 September 2013 12:44, Steven D'Aprano <steve@pearwood.info> wrote:
Correct, that's exactly what they should do.
Correct, it doesn't help you until it graduates from provisional status. Less risk averse users may choose to use it sooner (since any removed modules would almost certainly find a home on PyPI in fairly short order)
"We expect that most of the API of most provisional packages will be unchanged at graduation. Withdrawals are expected to be rare." Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14 September 2013 12:44, Steven D'Aprano <steve@pearwood.info> wrote:
Oops, meant to reply to this part, too. PEP 411 is an Informational PEP, not a standards track PEP. It's there to describe the policy, not to make the case for *having* the policy. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 13, 2013 at 06:00:18PM -0700, Ethan Furman wrote:
Sorry, I missed replying to this part. See Serhiy's post: https://mail.python.org/pipermail/python-dev/2013-September/128633.html and suggested patch here: http://bugs.python.org/issue18986 -- Steven

On Sat, 14 Sep 2013 10:49:52 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
I don't think it really matters. From getitem() it's trivial to extract the key alone.
Even after it's used, it will still be bikeshedded: such is how proposals are generally discussed. There really isn't very much to decide here, and whether we only return a key or a (key, value) pair is almost cosmetic: both APIs are reasonably convenient.
Well, __missing__ isn't used very much, I think. People use defaultdict. (note that I'm not even proposing __transform__ right now :-))
Ok, thanks. Regards Antoine.

On 14/09/2013 01:49, Steven D'Aprano wrote:
I think must be missing something. I thought that iterating over the dict would yield the original keys, so if you wanted the original key and value you would write: for key, value in data.items(): print("{}: {}".format(key, value)) and if you wanted the transformed key you would apply the transform function to the key.

On 09/13/2013 06:25 PM, MRAB wrote:
Well, that's certainly how I would do it. ;)
and if you wanted the transformed key you would apply the transform function to the key.
Indeed. The question is: how? It is entirely possible that your function has a TransformDict alone, and no memory of the transform function used to create the dict... If the key transform function were saved directly on the TransformDict instance as, say, .transform_key, then problem solved. -- ~Ethan~

On Fri, Sep 13, 2013 at 06:40:06PM -0700, Ethan Furman wrote:
While I think it is good and proper to have the transformation function available, if you're manually applying the transformation function then IMO chances are good that you've missed the whole point of the transformdict. Think of defaultdict. The purpose of defaultdict is to avoid needing to check whether the key is missing or not. If I suggested that the way to use a defaultdict was to write code like this: value = d[key] if key in d else d.default_factory() I'd be missing the point. Likewise, the point of transformdict is to avoid needing to care about the transformation function 99% of the time. -- Steven

On 09/13/2013 08:33 PM, Steven D'Aprano wrote:
Likewise, the point of transformdict is to avoid needing to care about the transformation function 99% of the time.
No one's arguing against that point. It's the 1% of the time that being able to get the canonical name back is a Good Thing, and the best way to do that is to have the transform_key function easily available. -- ~Ethan~

On 13 September 2013 22:40, Ethan Furman <ethan@stoneleaf.us> wrote:
I hope you are aware that this pattern does not help when one wants _one_ canonical key having a non-canonical one, besides having to linearly walk through all keys and check the "__transform__" to each one. I mean - given no function to retrieve the canonical key, one would have to resort to: my_key = data.__transform__(given_key) for key, value in data.items(): if data.__transform__(key) == my_key: .... WHich would defeat not only the purpose of a Transform dict, but the purpose of a dict alltogether. (of course, the one obvious way to do it would be to have the original key stored along the value - which is just a bit less silly than the example above) OTOH, having a `get_canonical_key` method or similar seens trivial enough- if the only provided method retrieves the value as well, it would not be that bad.

On 09/13/2013 09:53 PM, Joao S. O. Bueno wrote:
True, but I was thinking Steve was talking about printing the entire dict, in which case that is, indeed, how I would do it.
Which is exactly why I, and others, would like to have the transform function easily available. Besides being able to use it to get a canonical key, one could use it to get the function itself. Yay, introspection! -- ~Ethan~

On Fri, 13 Sep 2013 21:59:11 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Well, no, you misunderstand :) The transform function takes an original key (perhaps "canonical") and returns the transformed key, it can't do the reverse which is what getitem() does. i.e.:
What getitem() does is make the surjection bijective by restricting its input domain to the set of stored keys. Regards Antoine.

14.09.13 20:41, Antoine Pitrou написав(ла):
There is one reason -- serialization. For example pickle saves the transform function as an argument for TransformDict constructor. Repr exposes the transform function too (in evaluable representation). Other serializers need the transform function too. My implementations expose it as public property (transform), your -- as private attribute (_transform). Or perhaps I misunderstood you?

15.09.13 00:58, Antoine Pitrou написав(ла):
The same functools has decorating_function() with the user_function argument. Also find_function, search_function, create_function, isfunction, isgeneratorfunction, from_function, has_function, copy_function, pickle_function, register_function, emit_function, and print_function. This is not counting tests, IDLE and private names.

On Sat, Sep 14, 2013 at 02:25:42AM +0100, MRAB wrote:
On 14/09/2013 01:49, Steven D'Aprano wrote:
You're missing that I'm not iterating over the entire dict, just some subset ("data") that I got from elsewhere. In real code, of course I would have to handle missing keys. My gut feeling is that normally I would want both the canonical key and the value, not just any valid key: # not this for key in data: value = d[key] print("{}: {}".format(key, value)) # but this for key in data: key, value = d.getitem(key) print("{}: {}".format(key, value)) but that's just a gut feeling, and I daresay that it would depend on the application.
and if you wanted the transformed key you would apply the transform function to the key.
The caller may not know or care what the transformation function is. The caller may only care about one or two things: - does this key match a key in the dict? - what is the canonical version of this key? For example, in a case-preserving file system, the canonical version of a file name is whatever the user first typed when they created the file: create file "SoMe FiLe" so the canonical version is, "SoMe FiLe". But any of these may be expected to match: some file SOME FILE Some File some%20file etc. It is *not* simply the case that the canonical version is "some file". Certainly that's not how case-preserving file systems work. -- Steven

On 09/13/2013 08:18 PM, Steven D'Aprano wrote:
You're missing that I'm not iterating over the entire dict, just some subset ("data") that I got from elsewhere.
Ah, okay. Between you and Antoine I am convinced that .getitem() is a good thing. So have that and .transform_key and we're golden! :) -- ~Ethan~

On Fri, 13 Sep 2013 20:40:58 +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
This motivation would be stronger if the last phrase was something like "where the keys are textual strings whose case is specified to be ignored on receipt but by either specification or custom is to be preserved or non-trivially canonicalized when retransmitted."
(it can be said that the pattern *projects* keys from the user-visible set onto the internal lookup set, hence this PEP's title)
Not clear what "projects" has to do with the PEP title.
You don't mention the alternate constructor discussion, or the rationale for following the defaultdict pattern (ie: following the established pattern :) --David

On Fri, 13 Sep 2013 16:09:10 -0400 "R. David Murray" <rdmurray@bitdance.com> wrote:
Thanks, will add.
The PEP was originally titled "a key-projecting dictionary" and I've changed it as it was too obscure. I've now also removed the obsolete part of the sentence above :-)
Ok, will do that too. Regards Antoine.

13.09.13 21:40, Antoine Pitrou написав(ла):
Alternative proposals and questions ===================================
Yet one alternative proposal is to add the dict.__transform__() method. Actually this not contradict TransformDict, but generalize it. TransformDict will be just handly interface to __transform__() as defaultdict to __missing__(). It provides only constructor, repr and pickling. My second patch for http://bugs.python.org/issue18986 shows that as implementation of dict.__transform__(), so implementation of TransformDict using dict.__transform__() are simple enough.

13.09.13 23:21, Antoine Pitrou написав(ла):
Both. On one side, with this proposition TransformDict itself doesn't deserve PEP. It will be trivial and obvious thing. Not very important. On other side, TransformDict can be implemented even without dict.__transform__(), on pure Python.

You know, I can think up several use cases for the case-insensitive idea. Not sure if I missed something, but there should be a function to retrieve the original key from the modified. i.e.:
On Fri, Sep 13, 2013 at 1:40 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
-- Ryan

On Sat, 14 Sep 2013 14:33:56 +0900 Larry Hastings <larry@hastings.org> wrote:
Well, a TransformSet is like a normal dict, you just need to call the transformation function yourself when inserting the keys. i.e.: d = TransformSet(str.lower) d.add('Foo') is the same conceptually as: d = {} d['Foo'.lower()] = 'Foo' d['foo'] # gets the original key Regards Antoine.

On 09/14/2013 07:30 PM, Antoine Pitrou wrote:
s/normal dict/normal set/ But, then, a TransformDict is like a normal dict, you just need to call the transformation function yourself when inserting the keys. Yet a TransformDict is a useful enough concept that it is being proposed for addition; I was wondering if a TransformSet would be useful, too. But I suppose we should take baby steps here. //arry/

15.09.13 14:23, Antoine Pitrou написав(ла):
A TransformDict is like a normal dict, you just need to call the transformation function yourself when inserting the keys and insert a pair of (orig_key, value), or is like two normal dicts, you just need to call the transformation function yourself when inserting the keys and update parallel mapping from transformed keys to original keys.

On 16 September 2013 00:45, Serhiy Storchaka <storchaka@gmail.com> wrote:
I think it's best to hold off a TransformSet, since you can trivially emulate it by setting the values in a TransformDict to None (that's how Python itself lived without a set type for so long: people just used dicts with all the values set to None). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, 14 Sep 2013 12:31:36 -0700 Guido van Rossum <guido@python.org> wrote:
The discussion has settled down for the most part, and there is a consensus amongst participants on the desireability of the feature and its characteristics. The implementation is straightforward pure Python (Serhiy's C proposal should probably be a separate enhancement request on the tracker). I think the proposal will soon be ready for a pronouncement - unless other concrete questions and suggestions are recorded. Thanks Antoine.

On Sep 22, 2013, at 6:16 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
Are we close to asking for pronouncement?
When you're ready, let me know. In the meantime, I conducting usability tests on students in Python classes and researching how well it substitutes for existing solutions for case insensitive dictionaries (the primary use case) and for other existing cases such as dictionaries with unicode normalized keys. If you want to participate in the research, I could also use help looking at what other languages do. Python is not the first language with mappings or to encounter use cases for transforming keys prior to insertion and lookup. I would like to find out what work has already been done on this problem. Another consideration is whether the problem is more general that just dictionaries. Would you want similar functionality in all mapping-like objects (i.e. a persistent dictionaries, os.environ, etc)? Would you want similar functionality for other services (i.e. case-insensitive filenames or other homomorphisms). You can also add to the discussion by trying out your own usability tests on people who haven't been exposed to this thread or the pep. My early results indicate that the API still needs work. * When shown code that uses a TransformDict, students don't seem to be able to deduce what the code does just from the context (this contrasts with something like OrderedDict and Counter where the name says what it does). * When given a description of the mechanics of a TransformDict, they don't seem to be able to figure-out what you would do with it without being given an example. * When given a example of using a TransformDict, they understand the example but don't seem to be able to come-up with other examples other than the one they were just shown. And when shown multiple examples, they can't think of other use cases where they've ever needed this in their own code. * This contrasts with the results when I show something less general like a CaseInsensitiveDict. People seem to get that right away. As you might expect, the generalized solution is harder to wrap your head around than a specific solution with a clear name. * One student asked, "why give regular dicts a key-function like sorted(), min() and max()?" I didn't have a good answer, but I haven't yet had time to read this whole thread. * Another issue is that we're accumulating too many dictionary variants and that is making it difficult to differentiate and choose between them. I haven't found anyone (even in advanced classes with very experienced pythonistas) would knew about all the variations: dict, defaultdict, Mapping, MutableMapping, mapping views, OrderedDict, Counter, ChainMap, andTransformDict. David Beazley on twitter recently proposed that we add a MinDict and MaxDict. There seems to be no shortage of ideas of things that can be done with dictionaries. Besides choosing among the dict variants, there is also confusion about other mapping topics such as 1) when to subclass from dict rather than inherit from MutableMapping, 2) the difference between defaultdict(int) and Counter's use of __missing__ to return zero, and 3) it seems that many experienced users can't even name all the existing methods on dictionaries (they forget clear(), copy(), pop(), popitem(), setdefault(), update() and the fromkeys() classmethod). Overall, my impression at this point is that key transformations are useful, but I'm not sure how to incorporate them without taking Python further away from being a language "that just fits in your head." Raymond

2013/10/4 Raymond Hettinger <raymond.hettinger@gmail.com>:
Ok, but none of these classes address use cases described of the PEP 455. If it became hard to choose the best container for an use case, it's maybe a documentation issue. The PEP 455 contains a long list of existing implementations, so it means that these use cases are common (even if the Python stdlib according to the PEP). It's a good thing that Python proposes a standard implementation (efficient, well tested, documented, etc.) to answer to these use cases. I'm not convinced by your usability test. The problem is maybe the name, TransformDict. We may find a more explicit name, like TranformKeyDict or NormalizedKeyMapping. Or we can use names of the Transformers movies: OptimusPrimeDict, BumblebeeMapping, JazzDictionary, etc. (If we cannot find a better name, we may add more specialized classes: KeyInsensitiveDict and IdentiyDict. But I like the idea of using my own "transform" function.) Victor

On Oct 4, 2013, at 2:06 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
I'm not convinced by your usability test.
You're not the one who needs to be convinced ;-) Please do conduct your own API tests and report back. This is necessary for a new class like TransformDict that was constructed from scratch and proposed for direct admission to the standard library. This contrasts with other tools like OrderedDict, ChainMap, and namedtuple which started their lives outside the standard library where we we able observe their fitness for real problems being solved by real users. None of my consulting client's have anything like a general purpose transforming dict in their utility modules, so we lack the real world experience that informed the design of the other tools in the collections module. To make up for that lack of information, we need to put it in front of users as well as do research into how other languages have tackled the use cases. In short, we need to know whether the API will make sense to people, whether their code will be more readable with a TransformDict, and whether the zoo of dict variants should continue to grow. Right now, I don't know those things. All I have to go on is that I personally think the TransformDict is a cool idea. However, that alone isn't sufficient for accepting the PEP. Raymond “… in order to get things merged you need to solve not only just your own problem but also realize that the world is bigger than your company and try to solve things in a way where it makes sense for other people, even if primarily it is for your own situation.” -- Linus Torvalds http://www.extremeta.com/2013/09/linus-torvalds-said-linuxcon-kernel-develop...

2013/10/4 Raymond Hettinger <raymond.hettinger@gmail.com <javascript:;>>:
Please do conduct your own API tests and report back.
I don't understand why there is a need to evaluate a "new" API: TransformDict has the same API than dict. The only differences are that the constructor takes an extra mandatory parameter and there is a new getitem() method. All other methods are the same: len(dict), key in dict, dict.get(key), dict[key]=value, etc.
The class is not "new": it is based on the API of dict, and existing specialized implementations already used in the wild. But I didn't compare TransformDict to existing specialized implementations... Antoine: do you know if your API is different? Maybe in some minor(?) details like storing the first key or last key.
Why do you say that TransformDict has no real use case, whereas similar containers are already used since many years in the Python standard library? Extract of the PEP: "Several modules in the standard library use identity lookups for object memoization, for example pickle, json, copy, cProfile, doctest and _threading_local." I didn't check this whole list, but it looks like some modules can directly use TransformDict(id), see for example the copy module. And "case insenstive" dictionaries are also used in various projects according to the PEP. Or do you mean that there are too few projects or they are not popular enough? In Python, email.Message uses a case-insensitive container preserving the original case, except that it does also transform the value. email.Message is used in http.client to parse HTTP headers. Twisted has its own InsensitiveDict type for that (which does not transform the value). http://twistedmatrix.com/documents/current/api/twisted.python.util.Insensiti...
“… in order to get things merged you need to solve not only just your own problem but also realize that the world is bigger than your company and
Maybe Antoine can provide us a patch on the Python standard library to reuse TransformDict(id) where it is possible. So we can compare the code before/after to see if it's more readable or not. try
to solve things in a way where it makes sense for other people, even if primarily it is for your own situation.” -- Linus Torvalds
http://www.extremeta.com/2013/09/linus-torvalds-said-linuxcon-kernel-develop... Existing implementations are specific (id() or str.lower()), whereas Antoine's container is generic (transform function). Or do you mean that it's not enough generic? It should do something on the value? Victor

06.10.13 00:08, Victor Stinner написав(ла):
Unfortunately the pickle and the copy modules can't use TransformDict(id) because they expose their mappings (with integer id keys) in public API. At least until Python 4.0. There are no so much use cases for IdentityDict in stdlib now, and even less for CaseInsensityDict. And a workaround is simple. I don't know about usage of TransformDict with other transfom function.

On Fri, Oct 04, 2013 at 11:06:15PM +0200, Victor Stinner wrote:
-1 on a plethora of specialised dicts. I do think that a TransformDict seems useful, and might even *be* useful, but would not like to see a whole pile of specialised dicts added to the std lib. I wonder though, are we going about this the wrong way? Since there is apparently disagreement about TransformDict, that suggests that perhaps we need more concrete experience with the basic idea before graduating to a concrete class in the std lib. Perhaps we should follow the example of dict, __missing__ and defaultdict. The dict class could do something like this on key access: if type(self) is not dict: # This only applies to subclasses, not dict itself. try: transform = type(self).__transform__ except AttributeError: pass else: key = transform(key) # now use the key as usual Am I barking up the wrong tree? Would this slow down dict access too much? -- Steven

On 10/07/2013 02:24 PM, Steven D'Aprano wrote:
Considering that __transform__ would usually not exist, and triggered exceptions are costly, I think it would. From the docs[1]: (10) If a subclass of dict defines a method __missing__, if the key k is not present, the a[k] operation calls that method with the key k as argument. The a[k] operation then returns or raises whatever is returned or raised by the __missing__(k) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__ is not defined, KeyError is raised. __missing__ must be a method; it cannot be an instance variable. For an example, see collections.defaultdict. New in version 2.5. So something more like: transform = getattr(self, '__transform__', None) if transform is not None: key = transform(key) ... A key difference (pun unavoidable ;) between __missing__ and __transform__ is that __missing__ is only called when a key is not found, while __transform__ needs to be called /every/ time a key is looked up: d[k] d.get(k) d.has_key(k) d.fromkeys(...) d.setdefault(...) k in d -- ~Ethan~

On Mon, Oct 07, 2013 at 02:55:44PM -0700, Ethan Furman wrote:
I fear you are right, particularly for subclasses. dict itself would only have the cost of testing whether the instance is an actual dict, which I presume is cheap. Still, given enough "cheap" tests, the overall performance hit could be significant. [...]
One minor point, being a dunder method, it should be grabbed from the class itself, not the instance: getattr(type(self), ...)
Yes. -- Steven

On 8 Oct 2013 07:26, "Steven D'Aprano" <steve@pearwood.info> wrote:
The problem is doing this in a way that keeps a strong reference to the original key (and produces that when iterating) while doing the lookup based on the transformed keys. That said, with the current plan to lower the barrier to entry for PyPI dependencies (I should have the 3.4 only ensurepip proposal written up some time this week), I think it makes sense to let this one bake on PyPI for a while. I think there *is* a potentially worthwhile generalisation here, but I'm far from sure "is-a-dict" is the right data model - for composability reasons, it feels like a "has-a" relationship with an underlying data store may make more sense. (If performance is critical, you're going to write a dedicated type anyway, so composability and simplicity strike me as more important criteria at this point). Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On Tue, 8 Oct 2013 08:31:46 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
"the current plan to lower the barrier to entry for PyPI" sounds a lot like the obsession du jour to me :-) It's not like "ensurepip" makes it cheaper / more attractive to add dependencies. It just provides a better experience for those who *want* to use pip (and would otherwise have installed it using an explicit download).
It doesn't work. Your "underlying mapping" can show too much variation for the wrapper/proxy to have sane semantics. For example, how do you combine with a defaultdict or a WeakKeyDictionary, knowing that the wrapper/proxy has to have its own internal mapping as well?
A dedicated CaseInsensitiveDict won't be much faster than TransformDict(str.casefold). Regards Antoine.

On 8 Oct 2013 18:36, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
It's OK if the key transforming API has to constrain the behaviour of the underlying mapping or require an appropriately designed transform function to handle more esoteric containers. Either would still be better than categorically *disallowing* composition to the point where you can't even compose it with OrderedDict. ChainMap doesn't compose sensibly with arbitrary mappings like defaultdict, but composition is still the right design choice because it works well with *most* custom mappings. It's not that I think this is necessarily a *bad* idea (although the composability problem gives me grave doubts), it's that I think it's not an *urgent* idea, so why rush to review and include it in the weeks remaining before the beta, when the option of publishing it as a recipe or a PyPI module remains available? Enums eventually made it in because we wanted to *use* them to dramatically improve error messages from other stdlib modules. What's the comparable end user payoff for including TransformDict in 3.4 rather than 3.5 (or never)? Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Le Tue, 8 Oct 2013 22:12:02 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
Well, you could ask the same question about OrderedDict, defaultdict or Weak*Dictionary since neither of them use composition :-)
ChainMap is easy to compose since it doesn't have to keep any data-driven internal state.
It's just that I disagree we're rushing anything. The feature is fairly simple, many people have already had a need for something like that. (and amongst those people absolutely *zero* have said the design of the feature proposal is inadequate) Regards Antoine.

On 8 Oct 2013 22:31, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
We *did* ask the same question about those (except the Weak* variants, simply due to age). Each time, the point that each new dict variant would be used to justify yet *more* variants was a cause for concern. defaultdict made it through because it's just a convenience API for the underlying "key missing" protocol, while OrderedDict spent time maturing outside the standard library first.
Yet composing ChainMap with defaultdict still breaks, because it's a conceptually incoherent thing to do. "Composition doesn't work with some mappings" isn't an adequate answer to the criticism that a composition based design could work with more mappings than just the builtin dict.
Except the one who wanted to combine it with OrderedDict. Users won't be able to combine it with ChainMap either. The concrete container vs container-wrapper design decision is a fundamental one and I don't believe the current PEP adequately makes the case in favour of the former. Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 9 Oct 2013 01:07, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
Just that the comment I replied to is getting very close to the argument "we have so many non-composable mapping variants already, what's the harm in adding one more?". I believe potentially enabling that argument in the future was cited as a downside for all of defaultdict, OrderedDict and Counter.

Good evening, On Fri, 4 Oct 2013 13:38:05 -0700 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
You can also add to the discussion by trying out your own usability tests on people who haven't been exposed to this thread or the pep.
I think "usability tests" should be conducted on people who actually have a need for the API. Otherwise they simply don't make sense: if you don't need an API, then you don't have to learn / understand it either. As an example, if you conduct random usability tests about "yield from" (PEP 380, accepted) or single-dispatch generic functions (PEP 443, accepted), you'll probably get a negative outcome, especially on students. Or if you conduct usability tests about the ssl module on someone who's never done any network programming, you'll get the similar kind of negative results.
Well, the documentation is the place where we give examples.
Is it any different for e.g. defaultdict? Because the mechanics are exactly the same: a generic construct which you can specialize for various use cases.
Yet the generic solution is applicable to far many cases than the specialized one. I'm not against adding a CaseInsensitiveDict, but that would be a rather bizarre thing to do given we can add a generic construct that's far more powerful, and not significantly more difficult.
:-) The key answer is: when you want to retain the original key.
It shouldn't be difficult, actually, because it doesn't make sense to "choose" at all. The use cases for OrderedDict, Counter, TransformDict and defaultdict are completely different.
Is that actually a problem?
The language fits in your head, but the stdlib doesn't. I don't think it has done so for ages :-) I'm not proposing TransformDict as a builtin, though. Regards Antoine.

On Oct 4, 2013, at 2:14 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
You're right. Students don't make the best test subjects. It might be nice to present this at a Python meet-up or somesuch. Or some people on this list can present it at work to see how their colleagues do with it. Also, it might be nice to get feedback from existing users of IdentityDicts or CaseInsensitiveDicts to see if they are bothered by the implementation having two underlying dictionaries. Raymond

In article <C4C036B6-130C-4718-BEB1-A7C9230082F4@gmail.com>, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I agree. I personally think being able to transform keys would be much more useful as a property of existing dictionaries. I often use case-insensitive keys. But I use them with dict and OrderedDict (and probably ought to use defaultdict, as well). TransformDict is neat, but I'd personally be happier seeing this as a 3rd party library for now. -- Russell

On Oct 28, 2013, at 1:16 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
so what is the status of the PEP 455 (TransformDict)?
I'm giving a thorough evaluation of the proposal and am devoting chunks of time each weekend to reviewing the email threads, the links provided in the PEPs, looking at how well the TD fits in existing code. I'm giving this work priority over my own list of things to add to 3.4 (most of which will now have to wait until 3.5). This week, I'm teaching a five-day intermediate python class to highly experienced network engineers in Denver. We'll do some exercises using the TD and evaluate the results against alternative approaches. Here are some preliminary notes (in no particular order): * A first reading of the python-dev threads suggests that the two-dict TD implementation seems better suited to implementing a case-folding-case-preserving dictionary and is far less well suited for a case-folding dict or an identity dict. * There are interesting differences between the proposed TD and the CaseInsensitiveDict implemented in Kenneth Reitz's HTTP requests library. The latter keeps the last key added rather than the first. It also has a cleaner implementation and the API is a bit nicer (no getitem() method). * The "originals" dict maps a transformed key back to its first saved original value. An alternative would be to map back to a set of original values or a list of original values. * A possible use case is for a unicode normalizing dictionary where 'L' + chr(111) + chr(776) + 'wis' would match 'L' + chr(246) + 'wis'. * The implementation looks rough at this point, but that is easily fixed-up. I'll leave some specific suggestions on the tracker (getting it to accept a list of tuples in the constructor, a recursive repr, renaming the getitem() method, deprivatizing the attributes, getting it to work with __missing__, etc). * Having two-mappings-in-one seems to be the most awkward part of the design and wouldn't be needed for the two strongest use cases, a case-insensitive-but-not-case-preserving dict and an identity dict. * In http://stackoverflow.com/questions/13230414, the OP wants a CI dict but doesn't need the case preserving aspect. The OP is provided with a regular dictionary containing mixed case keys and needs to compare a list of potential matches of unknown case. Copying the dict to a case-folding TD wouldn't provide any incremental advantage over building a regular dict with lower case keys. * In http://www.gossamer-threads.com/lists/python/python/209527, the OP wanted an ID comparison for a symbolic calculation. The objects already had an eq function and he wanted to temporarily bypass that in a symbol lookup. Essentially he needed a dictionary that would allow the use of an alternative equality function. A TD would work here but there would be no need for the key preserving feature. There doesn't seem to be any advantage over using a regular dict that directly stores the id() as the key. * In https://mail.python.org/pipermail/python-ideas/2010-May/007235.html, the OP wants a highly optimized identity dictionary that doesn't call an expensive id() function. The proposed TD doesn't fulfill this request -- it would be far from being optimized and would call id() on every store and every lookup. * In http://msdn.microsoft.com/en-us/library/xfhwa508.aspx, the API describes a dict with an alternative equality comparison. This is a different design than the TD and is for a world that is somewhat different from Python. In that dict, eq and hash are specified at the dict level rather than object level (in Python, the key objects define their own __eq__ and __hash__ rather than having the user attach the functions directly to the dictionary). * In http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html, the IdentityHashMap is described as being for rare use cases such as topology-preserving object graph transformations likes serialization or deep-copying. Looking at Python's own code for copy.deepcopy(), the TD would not be a good replacement for the existing code (more memory intensive, slower, no use for the originals dict, and no use for most of the TD functionality). It appears that it is simpler and faster to store and lookup d[id(obj)] than to use a TD. * If this were being modeled in a database, we would have one table with a many-to-one mapping of original keys to transformed keys and another table with a transformed key as the primary key in a table of key-value pairs. This suggests two lookup patterns original->tranformed->value and transformed->all_originals. * The Apache case insensitive dict documentation includes these thoughts: "This map will violate the detail of various Map and map view contracts. As a general rule, don't compare this map to other maps. In particular, you can't use decorators like ListOrderedMap on it, which silently assume that these contracts are fulfilled. --- Note that CaseInsensitiveMap is not synchronized and is not thread-safetiveMap.html --- This map will violate the detail of various Map and map view contracts. As a general rule, don't compare this map to other maps. In particular, you can't use decorators like ListOrderedMap on it, which silently assume that these contracts are fulfilled." * Using ACK to search through Django, it looks like there over a half-dozen case-insensitive lookups with are implemented using d[k.lower()] and wouldn't be better-off using the proposed TD. The notes above are rough and I still have more work to do: * close reading of the remaining links in the PEP * another round of user testing this week (this time with far more experienced engineers) * review the long python-dev thread in more detail * put detailed code suggestions on the tracker All of this will take some time. I understand that the 3.4 feature deadline is looming, but I would like to take the time to thoroughly think this through and make a good decision. If I had to choose right now, a safe choice would be to focus on the primary use case and implement a clean CaseInsensitiveDict without the double-dict first-saved case-preserving feature. That said, I find the TD to be fascinating and there's more work to do before making a decision. Hopefully, this post will make the thought process more transparent. Cheers, Raymond

On Wed, 30 Oct 2013 01:12:03 -0600, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Please be aware that the PEP author's motivation in submitting the PEP was to have a case insensitive, case *preserving* dict. The generalization serves to make the new datatype more useful, but if the end result doesn't satisfy the original use case of the author, I won't be surprised if he has no motivation to work on it further :). --David

It strikes me that there could be an alternative approach to some of the use cases discussed here. Instead of a new type of dictionary, the case-insensitivity problem could be solved with something akin to a * CaseInsensitiveString* class used for keys within a standard dictionary. This would be very similar to a normal string except with comparison and hashing. It would mean that CaseInsensitiveString("Foo") is considered equal to CaseInsensitiveString("foo") and allow code such as the following:
This would obviously also be usable in other places where case-insensitive strings are required. Just my two pence/cents/other minor currency units. Nigel On 30 October 2013 14:18, Ethan Furman <ethan@stoneleaf.us> wrote:

On 10/30/2013 09:34 AM, Nigel Small wrote:
It strikes me that there could be an alternative approach to some of the use cases discussed here. Instead of a new type of dictionary, the case-insensitivity problem could be solved with something akin to a *CaseInsensitiveString* class [...]
The nice thing about the TransformDict is it is usable for much more than simple case-insensitivity. -- ~Ethan~

True, but I could similarly argue that the nice thing about CaseInsensitiveString is it is usable for much more than dictionary keys - it just depends on your point of view. There would be nothing stopping other types of dictionary key transformation being covered by other key data types in a similar way, I'm simply trying to raise the question of where the genericity could sit: in the dictionary or in the key. Nigel On 30 October 2013 17:04, Ethan Furman <ethan@stoneleaf.us> wrote:

On Wed, 30 Oct 2013 16:34:33 +0000 Nigel Small <nigel@nigelsmall.com> wrote:
And how does a case-insensitive string compare with a normal (case-sensitive) string? This is a can of worms. (if you answer, please don't answer in this thread but open a separate one for case-insensitive strings, thanks) Regards Antoine.

And how does a case-insensitive string compare with a normal (case-sensitive) string? This is a can of worms.
I was wondering this myself. I suspect it would depend which string is on the left hand side of the comparison operator, yes? Can of worms, indeed. implicit-insensitve-i-n-ly, y'rs, Skip

Hi Raymond, On Wed, 30 Oct 2013 01:12:03 -0600 Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Thanks for the thorough status report.
First-vs-last has already been discussed in the previous thread. My initial hunch was to keep the last key, but other people made the point that first was both more compliant (with current dict behaviour) and more useful (since you can override it by deleting and then reinserting the entry). Regards Antoine.

Sorry I missed the original discussion, but isn't this a simple case of putting a decorator around the getitem method (to transform the input key) and a single line in the body of the setitem method, making this very easy adaptation of the existing dict class? Mark

On Mon, Oct 07, 2013 at 06:17:09PM -0700, Mark Janssen wrote:
Not really. We can try what you suggest to implement a case insensitive dict (decorator is not needed): py> class CaseInsensitiveDict(dict): ... def __getitem__(self, key): ... key = key.casefold() # use .lower() before Python 3.3 ... return super().__getitem__(key) ... def __setitem__(self, key, value): ... key = key.casefold() ... super().__setitem__(key, value) ... py> d = CaseInsensitiveDict() py> d['X'] = 42 py> d {'x': 42} Well, that's not exactly what I was hoping for... I was hoping that the dict would preserve case, rather than just convert it. But that's only the start of the problem: py> d['X'] # this works 42 py> d.pop('X') # expecting 42 Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'X' So no, it isn't just a matter of a trivial wrapper around __getitem__, __setitem__ and __delitem__. Check the bug tracker for more detail: http://bugs.python.org/issue18986 -- Steven

On Fri, Sep 13, 2013 at 10:40 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
Hello. Overall I think that's a great idea. Here are some questions on it though. I'm sorry if some of these have already been discussed in some other thread. 1. Thread safety. PEP doesn't mention anything about thread safety while the implementation proposed in the tracker is (very) not thread-safe. I think, PEP should mention that this class have no guarantees. 2. Extra dict. There should be a way to avoid creation of the second dict when there is no need to store original keys. For example, email.message module doesn't store original headers as they are not needed. The same applies to web frameworks parsing HTTP headers or WSGI environment. I'm sure I had another one. I'll send it once I remember. -- Kind regards, Yuriy.

On Tue, Oct 8, 2013 at 10:17 PM, MRAB <python@mrabarnett.plus.com> wrote:
If you don't need the original key, then you might as well just use a transform function with a dict.
As I understood, storing original keys is not the purpose of TransformDict, allowing hashing on something other then provided key itself is. This part presents interest for the cases I mentioned. -- Kind regards, Yuriy.

On Tue, 08 Oct 2013 22:02:43 +0400, Yuriy Taraday <yorik.sar@gmail.com> wrote:
This is not true. email.message *very carefully* preserves the original header name, case and all [*], that's part of its mandate (faithfully reproducing the original parsed message). That said, email.message can't use transformdict, since email.message needs a list-with-case-insensitive-keyed-lookup, not a dict, because it also preserves the original *order* of the headers. And it couldn't use an orderedtransformdict, either, since it also has to preserve *duplicate* headers. --David [*] currently it loses whitespace information in certain cases, but the new header parser/folder is supposed to fix that...or will once I fix a few more corner cases :)
participants (22)
-
Antoine Pitrou
-
Eli Bendersky
-
Ethan Furman
-
Georg Brandl
-
Guido van Rossum
-
Joao S. O. Bueno
-
Larry Hastings
-
Mark Janssen
-
Mark Shannon
-
MRAB
-
Nick Coghlan
-
Nigel Small
-
R. David Murray
-
Raymond Hettinger
-
Russell E. Owen
-
Ryan Gonzalez
-
Serhiy Storchaka
-
Skip Montanaro
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Victor Stinner
-
Yuriy Taraday