Add a "transformdict" to collections

Hello, In http://bugs.python.org/issue18986 I proposed adding a new mapping type to the collections module. The original use case is quite common in network programming and elsewhere (Eric Snow on the tracker mentioned an application with stock symbols). You want to have an associative container which matches keys case-insensitively but also preserves the original casing (e.g. for presentation). It is a commonly reimplemented container. It is also an instance of a more general pattern: match keys after applying a derivation (or coercion) function, but at the same time keep track of the original key. Note that the derivation function needn't be (and generally won't be) bijective, otherwise it's too simple. Therefore I propose adding the general pattern. Simple example:
(case-insensitive but case-preserving, as the best filesystems are ;-)) On the tracker issue, it seems everyone agreed on the principle. There is some bikeshedding left to do, though. So here are the reasonable naming proposals so far: - transformkeydict - coercekeydict - transformdict - coercedict I have a sweet spot for "transformdict" myself. Regards Antoine.

(case-insensitive but case-preserving, as the best filesystems are ;-))
I have a sweet spot for "transformdict" myself.
Antoine, "Transform" does not remind me of "case-insensitive but case-preserving". If this is important enough to put into the collections module (I'm skeptical), shouldn't that behavior be more strongly hinted at in the name? Lot's of things are transformations. You're interested in a very specific one. Skip

2013/9/10 Antoine Pitrou <solipsis@pitrou.net>:
If it is commonly reimplemented, what is the most common name? :-) The http.client and email.message modules convert headers to lower case, but keep the original case.
- transformkeydict
Do you know a use case where values need also to be transformed? If not, I prefer the transformdict name.
- coercekeydict - coercedict
I only read "coerce" in old Python documentation, not in other languages. I prefer the more common and generic term "tranform". Victor

On Sep 10, 2013, at 12:04 PM, Victor Stinner wrote:
The http.client and email.message modules convert headers to lower case, but keep the original case.
As RDM pointed out on the tracker, email headers aren't a great use case for this because they aren't really dictionaries. They're lists with some dict-like syntax. -Barry

On Sep 10, 2013, at 03:57 PM, Antoine Pitrou wrote:
Not really. Email headers support duplicates, although the dict-syntax will only return the first such header. Alternative syntax like .get_all() gives you all of them. But anyway, don't let this stop you! I like your robots-in-disguise[1] dicts no matter what you call them, and even if email can't use them. -Barry [1] http://www.youtube.com/watch?v=59QeKkjishs

Is this just syntactic sugar for recursive lookup of a transformed version in __missing__? Or a way of supplying a custom "key" function to a dictionary? Any such proposal should consider how it composes with other dict variants like defaultdict, OrderedDict and counter. Cheers, Nick.

Le Tue, 10 Sep 2013 22:00:37 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
Is this just syntactic sugar for recursive lookup of a transformed version in __missing__?
Nope. For one, it doesn't use __missing__ at all. I think using __missing__ would be... missing the point, because it wouldn't working properly if you have e.g. X != Y such that transform(X) == Y and transform(Y) == X. (a simple example being transform = str.swapcase :-))
Or a way of supplying a custom "key" function to a dictionary?
Probably, although I'm not entirely sure what you mean by that :-)
Any such proposal should consider how it composes with other dict variants like defaultdict, OrderedDict and counter.
Well, one sticking point here is that those variants don't compose with each other already :-) But, yes, I may make another proposal with composition in mind. Regards Antoine.

On Tue, Sep 10, 2013 at 5:22 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Does anyone have an idea how to make the existing variants more composable? It would be nice to get a better understanding of this before we add another variant. Conceptually, composability makes a lot of sense (what if we want this transformdict but also in insertion order...) Eli

Le Tue, 10 Sep 2013 07:18:41 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
AFAICT, the main problem with composability is that the implementation is less simple and it removes many opportunities for optimization. It may or may not be a problem, depending on your use cases. (I'm playing a bit with a TransformMap which is a composable version of transformdict, and it's clear the implementation is less amenable to optimization tweaks and shortcuts: in fact, I have to *add* some logic to e.g. cater with defaultdict) Regards Antoine.

On 10 September 2013 13:00, Nick Coghlan <ncoghlan@gmail.com> wrote:
Not quite, because the dict should preserve the originally entered key.
This actually prompts the question, what should the following produce:
['FOO'] or ['foo']? Both answers are justifiable. Both are possibly even useful depending on context... Paul

Le Tue, 10 Sep 2013 13:24:29 +0100, Paul Moore <p.f.moore@gmail.com> a écrit :
I think it would be best to leave it as an implementation detail, because whichever is easiest to implement depends on the exact implementation choices (e.g. C vs. Python). (my prototypes right now conserve the last __setitem__ key, not the first) Regards Antoine.

Am 10.09.13 14:35, schrieb Antoine Pitrou:
I think this is the point where the datatype is *not* clearly straight-forward, and thus deserves a PEP. I think it would be a flaw to have this detail implementation-defined. This would be like saying that it is implementation-defined which of A,B,C is returned from "A and B and C" if all are true. Regards, Martin

On Tue, 10 Sep 2013 14:44:01 +0200 "Martin v. Löwis" <martin@v.loewis.de> wrote:
Not saying you're necessary wrong, but a few data points: - defaultdict was added without a PEP: http://bugs.python.org/issue1433928 - Counter was added without a PEP: http://bugs.python.org/issue1696199 - ChainMap was added without a PEP: http://bugs.python.org/issue11089 http://bugs.python.org/issue11297 - namedtuple was added without a PEP: https://mail.python.org/pipermail/python-dev/2007-February/071196.html (no tracker reference for its addition) OrderedDict, however, came with a PEP: http://www.python.org/dev/peps/pep-0372/
Ok, it seems everyone (except me :-)) agrees that it should return the first key value, so that's how it will be. Regards Antoine.

On 10 September 2013 19:31, Antoine Pitrou <solipsis@pitrou.net> wrote:
If you retain the first key value, it's easy enough for the application to implement "retain the last" semantics: try: del d[k] finally: d[k] = v If you provide "retain the last", I can't see any obvious way of implementing "retain the first" in application code without in effect reimplementing the class. Paul

On 10 September 2013 18:06, Antoine Pitrou <solipsis@pitrou.net> wrote:
Well, I'd expect it to simply be there. I had not thought of other usecases for the transformdict itself - but if I would use it and would need the original key without such a method it would not be trivial to get it. For example, in latim languages it is common to want accented letters to match their unaccented counterparts - pick my own first name "João" - if I'd use a transform to strip the diactriticals, and have an user input "joao" - it would match, as intended - but I would not be able to retrieve the accented version without re-implementing the class behavior. js -><-

On 9/10/2013 2:46 PM, Antoine Pitrou wrote:
But they don't change the keys (although numbers have different representations on occasion). One use of transformdict might be to allow use of non-hashable items as keys, by extracting an actual key from the internals of the non-hashable item. The key may be sufficiently unique to enable use of the dict structure for lookups, but it would certainly be handy to obtain the actual item again. Without a canonical lookup feature, one would be forced to also include the key as part of the value, or some such hack. I also thought João's example was a very practical reason to have the canonical lookup feature, by some name or another.

Le Tue, 10 Sep 2013 15:09:56 +0200, Hrvoje Niksic <hrvoje.niksic@avl.com> a écrit :
It's not that obvious. It's not common to rely on that aspect of dict semantics, because you will usually lookup using the exact same type, not a compatible one. I would expect very little code, if any, to rely on this. (also, I would intuitively expect the latest key to be held, not the first one, since that's what happens for values.) Regards Antoine.

On 9/10/2013 10:54 AM, Antoine Pitrou wrote:
I intuitively expected, and I think most often would want, the first key to be held. The reason being that I would except most often the initial insertion to be the canonical form. Whichever way is adopted, I think even if you say it's implementation specific, people will end up relying on it in their code. Janzert

On 09/10/2013 11:54 AM, Janzert wrote:
My stock ticker example (but from a real problem I had a few days ago) would want to have this example: I want the first key, because I'm pre-seeding the dict with a variety of keys that are of the canonical form.
Whichever way is adopted, I think even if you say it's implementation specific, people will end up relying on it in their code.
Agreed. Eric.

On 10/09/2013 10:28am, Antoine Pitrou wrote:
I guess another example is creating an "identity dict" (see http://code.activestate.com/lists/python-ideas/7161/) by doing d = transformdict(id) -- Richard

Could a more generic variant of this class work? In the same way that `sorted` can accept a comparison function, similar could be done for a dictionary-like class: d = transformdict(key=str.lower) Strictly speaking, this would provide case-insensitive but not case-preserving behaviour. For any given use case, though, a function could instead be supplied to "normalise" the key (upper, lower, title case, etc) in a way that fits that case. I can't think of many real cases where multiple types of capitalisation would be useful within the same dictionary. Nigel On 10 September 2013 15:40, Richard Oudkerk <shibturn@gmail.com> wrote:

On Sep 10, 2013, at 4:28 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
In http://bugs.python.org/issue18986 I proposed adding a new mapping type to the collections module.
I would *really* like for this to start outside the standard library. It needs to mature with user feedback before being dumped in the collections module (which was never intended to be a giant pile of every collection a person could think of). Adding yet more dictionary variants is an example of way-too-many-ways-to-do-it. Raymond

Le Tue, 10 Sep 2013 21:40:36 -0500, Raymond Hettinger <raymond.hettinger@gmail.com> a écrit :
From a quick search: - case-insensitive dicts (use cases and implementation attempts): http://twistedmatrix.com/documents/current/api/twisted.python.util.Insensiti... https://mail.python.org/pipermail/python-list/2013-May/647243.html https://mail.python.org/pipermail/python-list/2005-April/296208.html https://mail.python.org/pipermail/python-list/2004-June/241748.html http://bugs.python.org/msg197376 http://stackoverflow.com/a/2082169 http://stackoverflow.com/a/3296782 http://code.activestate.com/recipes/66315-case-insensitive-dictionary/ https://gist.github.com/babakness/3901174 http://www.wikier.org/blog/key-insensitive-dictionary-in-python http://en.sharejs.com/python/14534 http://www.voidspace.org.uk/python/archive.shtml#caseless - identity dicts: https://mail.python.org/pipermail/python-ideas/2010-May/007235.html http://www.gossamer-threads.com/lists/python/python/209527 Python's own pickle module: http://hg.python.org/cpython/file/0e70bf1f32a3/Lib/pickle.py#l234
Well, thanks for the reminder, I was *indeed* going to dump all the collections I could think of in the collections module :-) (that would have been embarassing!) Seriously, I'm curious: what needs to mature, according to you? The proposed collection is a plain MutableMapping with the single addition of transforming the key on lookup. The use cases are well-defined and well-known. If you have any concrete questions or concerns then please offer them here.
Adding yet more dictionary variants is an example of way-too-many-ways-to-do-it.
So what is your proposal for what is definitely (see examples above, and this thread's and the tracker's responses) a very common need? Keep letting people write suboptimal, incomplete, buggy versions of the same thing? Thanks Antoine.

Seriously, I'm curious: what needs to mature, according to you?
In my mind, its availability on PyPI along with demonstrated use in the wild (plus corresponding votes to demonstrate that people use/like it) would help. That you can find several implementations at this doesn't mean it's necessarily worth adding to the std lib. Once in, it is very difficult to evict something that is later deemed not to have belonged in the std lib, so I think some extra scrutiny is worthwhile. Is there some obvious advantage to having a single API for this available to all Python applications? Are the semantics well-defined (do all the implementations you cited offer basically the same semantics)? The discussion so far here suggest that the best semantics might not be completely settled. (I still don't care for the name. "Transform" != "case folding" in my mind. A quick scan of your links suggests most people think something like "cidict" or "CaseInsensitiveDict" would be more descriptive.) Skip

On Wed, 11 Sep 2013 06:08:25 -0500, Skip Montanaro <skip@pobox.com> wrote:
The problem is that it is hard to get traction on pypi for a "small" feature like this. Most people will just implement a special purpose version that solves their specific need (possibly in a somewhat broken fashion) rather than add yet another dependency. Our approach in cases like this has (I think) been to learn from the existing implementations and build our own based on that (isn't that what happened with OrderedDict?)
I think the only question was what happens to the key on assignment, and that has been settled. Well, I guess there's also a question of how you create one, but that's not a question a pypi version would be any help in resolving: we'd still bikeshed it upon addition to the stdlib. In fact, I would not be surprised if *all* of the bikeshedding we are doing now (except this bit :) would take place if an existing pypi module for this were proposed for addition. I think the bigger question is composition of different dict types, and that is again something that isn't going to be solved by a pypi module.
Except it is wider than that: the transform function can be anything, not just case folding. I suggested surjectiondict or ontodict, but Antoine didn't like those :) (I had to look up the terms...it's been a long time since I studied math.) --David

On 11 September 2013 21:57, R. David Murray <rdmurray@bitdance.com> wrote:
I'll join the chorus requesting that this live on PyPI for a while first. I think this is a case similar to what happened with contextlib.ExitStack: I'm not sure if anyone actually *used* contextlib2 for anything significant (if they did, they didn't tell me about it), but just going through the process of properly documenting, publishing and testing it as a relatively independent project forced *me* to think through the design. Add in a couple of interesting conversation with Brandon Rhodes and others about the API design, and the end result was a *vast* improvement over what originally went up on PyPI. The barriers to making use of PyPI libraries are also falling with time, so a solid implementation may still see adoption, even if it's in the form of copy-and-paste programming rather than actual dependencies. I think there are also additional API decisions to be made beyond just the one about how to declare the mapping function, related to how to get the mapped key values *out*, as well as how to find out whether or not two potential key values map to the same actual key. As my preferred bikeshed colour, I'm going to suggest "MappedKeyDict" (using a similar naming style to OrderedDict), since the purpose of the container is to map the domain of the supplied keys to a different range before doing the value lookup. Suggested additional methods: md.map_key(key) # Applies the mapping function to the supplied key md.mapped_keys() # Like keys(), but with the key mapping function applied md.mapped_items() # Like items(), but with the key mapping function applied Another (more dubious) possible method: md.same_key(key1, key2) # "md.map_key(key1) == md.map_key(key2)" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Wed, 11 Sep 2013 22:47:12 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
ExitStack was quite a new thing, API-wise. The proposal here is to generalize something which already exists in various forms all over the Internet, and respecting a well-known API, MutableMapping. There are not many possible APIs to create case-insensitive dicts, or identity dicts. The only API contention until now has been about whether the first or last key would be retained (which settled on the former), and Serhiy's three instantiation schemes. The additional methods you suggest may be nice to have, but they are not necessary and (deliberately) not part of the original proposal. That is, they can be grafted later on.
As my preferred bikeshed colour, I'm going to suggest "MappedKeyDict" (using a similar naming style to OrderedDict),
Ha, another colour! Thanks :-) Regards Antoine.

Antoine Pitrou writes:
What Nick said was "I was too close to the design, and time and a very few good comments greatly improved the product." That's worth considering as generic advice rather than for the specifics of the case he faced compared to yours. Which modules in the stdlib would benefit from rewriting using "transformdict"? How about on PyPI?

Le Wed, 11 Sep 2013 22:50:08 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> a écrit :
"Time and comments" is why I'm posting a discussion thread here and waiting for responses. There are currently 34517 packages on PyPI. A new 34518th one won't magically get a lot of attention :-) (for example, I got very few comments when posting pathlib on PyPI - despite making several actual releases there -, compared to what I got from the PEP 428 discussion on python-ideas) Regards Antoine.

11.09.13 16:50, Stephen J. Turnbull написав(ла):
Which modules in the stdlib would benefit from rewriting using "transformdict"? How about on PyPI?
At least _threading_local, cProfile, doctest, json, and perhaps future implementations of __sizeof__ for some classes would benefit from rewriting using IdentityDict. Unfortunately copy and pickle expose their replacements of IdentityDict to public and can't change them. I don't known anything about general TransformDict use cases.

Am 11.09.13 15:04, schrieb Antoine Pitrou:
There are not many possible APIs to create case-insensitive dicts, or identity dicts.
That is certainly not true. Most obviously, you have the choice of a specialized case-mapping dict, or a generalized type that can be used for case mapping also. Does any of the referenced use cases actually use the API that you are proposing (i.e. with a key transformation function)? FWIW, I can think of yet another API for this: write a Proxy class class TransformedKey: # untested def __init__(self, original): self.original = original self.transformed = self.transform(original) def __hash__(self): return hash(self.transformed) def __eq__(self, other): return self.transformed == other.transformed def transform(self, key): raise NotImplementedError If there is then disciplined use in the form d[TransformedKey(key)] == value then you can use the existing dictionary type. Notice that this can do both case-insensitive and identity dicts (plus there are multiple choices of getting the transform function into it, as well as variations of the __eq__ implementation). There is evidence in this thread that people "grasp" case-insensitive more easily than the generalized API. For the record, I initially typed two responses into the tracker which I found to be incorrect before posting them, so I ended up posting neither. The "transformdict" type is not at all "natural", even though it may be useful. If you really want to "push" this API into 3.4, I think you will need to write a PEP, and find a PEP dictator who is willing to approve it. As you seem to dislike the idea of writing a PEP, I suggest to follow the idea of publishing it on PyPI now, and then proposing it for inclusion into 3.5. Regards, Martin

On Wed, 11 Sep 2013 19:31:56 +0200 "Martin v. Löwis" <martin@v.loewis.de> wrote:
Well, when you have a specialized need, you don't implement a generic version (*), so I don't think anybody implemented it ;-). The point of a generic API is to help replace all specialized implementations at once, rather than a small subset of them. (*) even the Twisted people don't seem to go that far
The problem is "disciplined use" here. This means any consumer of your API has to know about that quirk, so it's an inferior solution. (TransformedKey could be an internal detail of the implementation, as with weak dictionaries; but as a public API it doesn't seem very desirable)
The "transformdict" type is not at all "natural", even though it may be useful.
By that token, nothing is "natural" ;-) defaultdict isn't natural, yet it went in without a PEP. However, to all persons who already had that need and responded to the proposal, the generic version *does* seem natural enough, AFAICT. Only people who've never had such a need seem to fail to "grasp" it (and I only counted one such person, but my memory may be failing me). You know, I'm not against having "only" a case-insensitive dict. But that would be less useful all the while not being significantly easier to use, so I'm not really seeing the point.
What I dislike is the idea of doing additional work because some barriers are imposed ;-). PEP or PyPI are on a similar scale here. At least a PEP would help record the various discussion details, so I'd favour that over the PyPI path. Regards Antoine.

On Wed, Sep 11, 2013 at 10:47:12PM +1000, Nick Coghlan wrote:
I'll join the chorus requesting that this live on PyPI for a while first.
Another alternative would be ActiveState's Python recipes: http://code.activestate.com/recipes/langs/python which is a lot lighter weight than creating a package on PyPI. I believe that ChainMap, namedtuple, groupby, lru_cache and even math.fsum started life on ActiveState. -- Steven

On 09/11/2013 05:47 AM, Nick Coghlan wrote:
Personally, I would not add a PyPI dependency for a single object like transformdict. If we are that nervous about it we can make it provisional, but I don't see that as necessary.
I had the opposite experience. Going through PyDev to get Enum was *far* more educational, informative, and useful than creating an independent package. Plus, transformdict is not that new or different from existing dicts, and most decisions (which key to keep? how to initialize?) can be answered by simply being consistent with what we already have. -- ~Ethan~

On Wed, Sep 11, 2013 at 06:08:25AM -0500, Skip Montanaro wrote:
But the proposal is not for a case-insensitive dict. It is more general than that, with case-insensitivity just one specific use-case for such a transformative dict. Arguably the most natural, or at least obvious, such transformation, but there are others. I have code that does something like this: MAPPING = {'spam': 23, 'ham': 42, 'eggs': 17} result = MAPPING[key.strip()] # later... answer = MAPPING[key] # Oops, forgot to strip! This is broken. Using Antoine's proposal: MAPPING = TransformDict(str.strip) MAPPING.update({'spam': 23, 'ham': 42, 'eggs': 17}) result = MAPPING[key] # later... answer = MAPPING[key] # Fine now. so that the mapping handles stripping the keys, not the caller. This isn't just about strings, and certainly not just case-insensitivity. -- Steven

2013/9/11 Steven D'Aprano <steve@pearwood.info>:
For this use case, you should not keep the key unchanged, but it's better to store the stripped key (so MAPPING.keys() gives you the expected result). The transformdict type proposed by Antoine cannot be used for this use case. The os.environ mapping uses a subprocess of MutableMapping which accepts 4 functions: encoder/decoder for the key and encoder/decoder for the value. Such type is even more generic. transformdict cannot replace os._Environ. (Or do you prefer to store the ' eggs' key?) Victor

On 09/11/2013 06:58 AM, Victor Stinner wrote:
He isn't keeping the key unchanged (notice no white space in MAPPING), he's merely providing a function that will automatically strip the whitespace from key lookups.
The transformdict type proposed by Antoine cannot be used for this use case.
Yes, it can. --> from collections import transformdict --> MAPPING = transformdict(str.strip) --> MAPPING.update({'spam': 23, 'ham': 42, 'eggs': 17}) --> MAPPING transformdict(<method 'strip' of 'str' objects>, {'ham': 42, 'spam': 23, 'eggs': 17}) --> MAPPING[' eggs '] 17 -- ~Ethan~

2013/9/11 Ethan Furman <ethan@stoneleaf.us>:
transformdict keeps the key unchanged, see the first message:
'Foo' is stored as 'Foo', not as 'foo'. So for stripped keys: d=transformdict(str.strip); d[' abc ']; print(list(d)) should print "[' abc ']", not "['abc']". Is it the expected result? Victor

On 09/11/2013 08:49 AM, Victor Stinner wrote:
And indeed it does: Python 3.4.0a1+ (default:833246d42825+, Aug 31 2013, 14:17:59) [GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information. --> from collections import transformdict --> d=transformdict(str.strip); d[' abc '] = 42; print(list(d)) [' abc ']
Is it the expected result?
Yup! :) -- ~Ethan~

On 12 September 2013 02:03, Ethan Furman <ethan@stoneleaf.us> wrote:
That seems backwards to me. I would think that retrieving the keys from the dict would return the transformed keys (I'd call them canonical keys). That way there's no question about which key is stored - it's *always* the transformed key. In fact, I think this might get more traction if it were referred to as a canonicalising dictionary (bikeshedding, I know). Tim Delaney

On 09/11/2013 02:39 PM, Tim Delaney wrote:
At this point there is still no question: it's the first version of the key seen. For a stupid example: --> d = transformdict(str.lower) --> d['ThePyramid'] = 'Game Show' --> d['AtOnce'] = now() --> for k, v in d.items(): ... print(k, v) Imagine writing a function to get that capitalization right.
In fact, I think this might get more traction if it were referred to as a canonicalising dictionary (bikeshedding, I know).
Whoa, that's way harder to spell! ;) Drop the 'ising', though, and I'm in. -- ~Ethan~

On 09/11/2013 02:39 PM, Tim Delaney wrote:
I would think that retrieving the keys from the dict would return the transformed keys (I'd call them canonical keys).
The more I think about this the more I agree. A canonicaldict with a key function that simply stored the transformed key and it's value would seem to be a lot simpler: - no need to store a separate "presentation" key - no confusion about which of the first key/last key seen is stored - no mistakes with the "first" key not being added before real data and getting the presentation key wrong Further, in order to store the non-canonical keys a separate list must be kept of the keys to preseed the canonicaldict; if we store the canonical keys a separate list must be kept for presentation purposes -- so worst case scenario we're keeping the same amount of information and best-case scenario the presentation of the keys doesn't matter and we just saved ourselves an extra data structure. -- ~Ethan~

Le Thu, 12 Sep 2013 07:08:47 -0700, Ethan Furman <ethan@stoneleaf.us> a écrit :
And it wouldn't solve the use cases. What's the point?
Further, in order to store the non-canonical keys a separate list must be kept of the keys to preseed the canonicaldict;
Yeah, so this is totally silly. What you're basically saying is "we don't need TransformDict since people can re-implement it themselves". Regards Antoine.

On 09/12/2013 07:43 AM, Antoine Pitrou wrote:
Yeah, so this is totally silly. What you're basically saying is "we don't need TransformDict since people can re-implement it themselves".
No, what I'm saying is that the "case-preserving" aspect of transformdict is silly. The main point of transformdict is to enable, for example, 'IBM', 'Ibm', and 'ibm' to all match up as the same key. But why? Because you don't trust the user data. And if you don't trust the user data you have to add the correct version of the key yourself before you ever process that data, which means you already have the correct version stored somewhere. -- ~Ethan~

Le Thu, 12 Sep 2013 08:05:44 -0700, Ethan Furman <ethan@stoneleaf.us> a écrit :
That's assuming there is an a priori "correct" version. But there might not be any. Keeping the original key is important for different reasons depending on the use case: - for case-insensitive dicts, you want to keep the original key for presentation, logging and debugging purposes (*) - for identity dicts, the original key is mandatory because the id() value in itself is completely useless, it's just used for matching (*) For a well-known example of such behaviour, think about Windows filesystems. Regards Antoine.

On 13 September 2013 01:40, Antoine Pitrou <solipsis@pitrou.net> wrote:
In this case though, there are two pieces of information: 1. A canonical key (which may or may not equal the original key); 2. The original key. It seems to me then that TransformDict is a specialised case of CanonicalDict, where the canonical key is defined to be the first key inserted. It would in fact be possible (though inefficient) to implement that using a canonicalising callable that maintained state - something like (untested): class OriginalKeys: def __init__(self):: self.keys = CanonicalDict(str.lower) def __call__(self, key): return self.keys.setdefault(key, key) class OriginalKeyDict(CanonicalDict): def __init__(self):: super().__init__(OriginalKeys()) Tim Delaney

On 13 September 2013 07:29, Tim Delaney <timothy.c.delaney@gmail.com> wrote:
Bah - got myself mixed up with original key and case preserving there ... try this: class OriginalKeys: def __init__(self, func):: self.keys = CanonicalDict(func) def __call__(self, key): return self.keys.setdefault(key, key) class OriginalKeyDict(CanonicalDict): def __init__(self, func):: super().__init__(OriginalKeys(func)) class IdentityDict(OriginalKeyDict): def __init__(self): super().__init__(id) class CasePreservingDict(OriginalKeyDict): def __init__(self): super().__init__(str.lower) Tim Delaney

On Thu, 12 Sep 2013 08:05:44 -0700, Ethan Furman <ethan@stoneleaf.us> wrote:
No, in the *(original use case* (and many other use cases, such as the identity dict) we *do* trust the user data. It is more trustworthy than the key forms used to look up the data inside the program. That's the whole point. (Don't be distracted by Eric's particular use case.) But even for the general case...well, let me repost here what I said on the tracker: In our view, this data structure is for cases where the original key is the most important piece of information (about the keys). The transformation in the lookup process is entirely in the service of looking up the value paired with that original key when there is more than one possible representation of that key. It is the original key that is critical when re-serializing the data or otherwise making use of the keys for anything other than lookup. So this is about making the data structure succinctly model the problem domain, which is what OO is supposed to be good at :) --David

On 09/11/2013 06:58 AM, Victor Stinner wrote:
True, it's more generic -- but is it used more often? Or often enough to have those four functions be part of transformdict instead of just the one encodekey function? (I mean the idea of os._Environ, not that specific implementation.) Personally, I wouldn't mind having all four; for one thing, the name 'transformdict' would then be entirely appropriate. ;) -- ~Ethan~

Le Wed, 11 Sep 2013 07:48:56 -0700, Ethan Furman <ethan@stoneleaf.us> a écrit :
The key decoder function is quite useless since the original key is retained. As for the value encoder/decoder, I don't really see the point: just store whichever value you want to retrieve later. The main point of transformdict is that the *lookup* is done using a different key than the user-provided key. Regards Antoine.

There is a question about specifying the transform function. There are three ways to do this: 1. Positional argument of the constructor. d = TransformDict(str.casefold, Foo=5) 2. Subclassing. class CaseInsensitiveDict(TransformDict): def transform(self, key): return key.casefold() d = CaseInsensitiveDict(Foo=5) 3. Type generator. d = TransformDict(str.casefold)(Foo=5) Second method looks a little simpler to me from implementation side (What if you call __init__ for already initialized object? What if you used the object before calling __init__?). Third method allows you to customize other aspects of dict behavior (combine OrderedDict, defaultdict,..). First method looks less cumbersome from user side at first look. But how many different transform functions you use? Aren't they deserve named classes?

Le Wed, 11 Sep 2013 12:38:13 +0300, Serhiy Storchaka <storchaka@gmail.com> a écrit :
I thought about this first, and then I remembered that python-dev isn't generally very keen on subclassing-based APIs :-)
3. Type generator.
d = TransformDict(str.casefold)(Foo=5)
[...]
Third method allows you to customize other aspects of dict behavior (combine OrderedDict, defaultdict,..).
Well, no, it's not that easy. Especially since OrderedDict and defaultdict weren't written with combination in mind. Regards Antoine.

11.09.13 12:52, Antoine Pitrou написав(ла):
Why? This way looks most natural to me.
They can be rewritten. Actually the defaultdict is just simple wrapper around existing functionality of the __missing__ method. We can add the __transform__ method directly in the dict class. I think it will significant (2-3x) decrease a size of needed C code (no need in defining new type with constructors/destructors/etc, only lookup_transform).

Le Wed, 11 Sep 2013 13:37:53 +0300, Serhiy Storchaka <storchaka@gmail.com> a écrit :
Right now I am not proposing any C implementation of TransformDict. That could be added, but it's not what I'm pushing for here :-) But I don't think the "type generator" API would be easier to implement in C, anyway. Regards Antoine.

On 09/11/2013 02:38 AM, Serhiy Storchaka wrote:
This method follows the precedent of defaultdict: --> from collections import defaultdict --> l = defaultdict(list, foo=['ham','eggs']) --> l defaultdict(<class 'list'>, {'foo': ['ham', 'eggs']}) Without a compelling reason to change, we should keep it consistent. -- ~Ethan~

(case-insensitive but case-preserving, as the best filesystems are ;-))
I have a sweet spot for "transformdict" myself.
Antoine, "Transform" does not remind me of "case-insensitive but case-preserving". If this is important enough to put into the collections module (I'm skeptical), shouldn't that behavior be more strongly hinted at in the name? Lot's of things are transformations. You're interested in a very specific one. Skip

2013/9/10 Antoine Pitrou <solipsis@pitrou.net>:
If it is commonly reimplemented, what is the most common name? :-) The http.client and email.message modules convert headers to lower case, but keep the original case.
- transformkeydict
Do you know a use case where values need also to be transformed? If not, I prefer the transformdict name.
- coercekeydict - coercedict
I only read "coerce" in old Python documentation, not in other languages. I prefer the more common and generic term "tranform". Victor

On Sep 10, 2013, at 12:04 PM, Victor Stinner wrote:
The http.client and email.message modules convert headers to lower case, but keep the original case.
As RDM pointed out on the tracker, email headers aren't a great use case for this because they aren't really dictionaries. They're lists with some dict-like syntax. -Barry

On Sep 10, 2013, at 03:57 PM, Antoine Pitrou wrote:
Not really. Email headers support duplicates, although the dict-syntax will only return the first such header. Alternative syntax like .get_all() gives you all of them. But anyway, don't let this stop you! I like your robots-in-disguise[1] dicts no matter what you call them, and even if email can't use them. -Barry [1] http://www.youtube.com/watch?v=59QeKkjishs

Is this just syntactic sugar for recursive lookup of a transformed version in __missing__? Or a way of supplying a custom "key" function to a dictionary? Any such proposal should consider how it composes with other dict variants like defaultdict, OrderedDict and counter. Cheers, Nick.

Le Tue, 10 Sep 2013 22:00:37 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
Is this just syntactic sugar for recursive lookup of a transformed version in __missing__?
Nope. For one, it doesn't use __missing__ at all. I think using __missing__ would be... missing the point, because it wouldn't working properly if you have e.g. X != Y such that transform(X) == Y and transform(Y) == X. (a simple example being transform = str.swapcase :-))
Or a way of supplying a custom "key" function to a dictionary?
Probably, although I'm not entirely sure what you mean by that :-)
Any such proposal should consider how it composes with other dict variants like defaultdict, OrderedDict and counter.
Well, one sticking point here is that those variants don't compose with each other already :-) But, yes, I may make another proposal with composition in mind. Regards Antoine.

On Tue, Sep 10, 2013 at 5:22 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Does anyone have an idea how to make the existing variants more composable? It would be nice to get a better understanding of this before we add another variant. Conceptually, composability makes a lot of sense (what if we want this transformdict but also in insertion order...) Eli

Le Tue, 10 Sep 2013 07:18:41 -0700, Eli Bendersky <eliben@gmail.com> a écrit :
AFAICT, the main problem with composability is that the implementation is less simple and it removes many opportunities for optimization. It may or may not be a problem, depending on your use cases. (I'm playing a bit with a TransformMap which is a composable version of transformdict, and it's clear the implementation is less amenable to optimization tweaks and shortcuts: in fact, I have to *add* some logic to e.g. cater with defaultdict) Regards Antoine.

On 10 September 2013 13:00, Nick Coghlan <ncoghlan@gmail.com> wrote:
Not quite, because the dict should preserve the originally entered key.
This actually prompts the question, what should the following produce:
['FOO'] or ['foo']? Both answers are justifiable. Both are possibly even useful depending on context... Paul

Le Tue, 10 Sep 2013 13:24:29 +0100, Paul Moore <p.f.moore@gmail.com> a écrit :
I think it would be best to leave it as an implementation detail, because whichever is easiest to implement depends on the exact implementation choices (e.g. C vs. Python). (my prototypes right now conserve the last __setitem__ key, not the first) Regards Antoine.

Am 10.09.13 14:35, schrieb Antoine Pitrou:
I think this is the point where the datatype is *not* clearly straight-forward, and thus deserves a PEP. I think it would be a flaw to have this detail implementation-defined. This would be like saying that it is implementation-defined which of A,B,C is returned from "A and B and C" if all are true. Regards, Martin

On Tue, 10 Sep 2013 14:44:01 +0200 "Martin v. Löwis" <martin@v.loewis.de> wrote:
Not saying you're necessary wrong, but a few data points: - defaultdict was added without a PEP: http://bugs.python.org/issue1433928 - Counter was added without a PEP: http://bugs.python.org/issue1696199 - ChainMap was added without a PEP: http://bugs.python.org/issue11089 http://bugs.python.org/issue11297 - namedtuple was added without a PEP: https://mail.python.org/pipermail/python-dev/2007-February/071196.html (no tracker reference for its addition) OrderedDict, however, came with a PEP: http://www.python.org/dev/peps/pep-0372/
Ok, it seems everyone (except me :-)) agrees that it should return the first key value, so that's how it will be. Regards Antoine.

On 10 September 2013 19:31, Antoine Pitrou <solipsis@pitrou.net> wrote:
If you retain the first key value, it's easy enough for the application to implement "retain the last" semantics: try: del d[k] finally: d[k] = v If you provide "retain the last", I can't see any obvious way of implementing "retain the first" in application code without in effect reimplementing the class. Paul

On 10 September 2013 18:06, Antoine Pitrou <solipsis@pitrou.net> wrote:
Well, I'd expect it to simply be there. I had not thought of other usecases for the transformdict itself - but if I would use it and would need the original key without such a method it would not be trivial to get it. For example, in latim languages it is common to want accented letters to match their unaccented counterparts - pick my own first name "João" - if I'd use a transform to strip the diactriticals, and have an user input "joao" - it would match, as intended - but I would not be able to retrieve the accented version without re-implementing the class behavior. js -><-

On 9/10/2013 2:46 PM, Antoine Pitrou wrote:
But they don't change the keys (although numbers have different representations on occasion). One use of transformdict might be to allow use of non-hashable items as keys, by extracting an actual key from the internals of the non-hashable item. The key may be sufficiently unique to enable use of the dict structure for lookups, but it would certainly be handy to obtain the actual item again. Without a canonical lookup feature, one would be forced to also include the key as part of the value, or some such hack. I also thought João's example was a very practical reason to have the canonical lookup feature, by some name or another.

Le Tue, 10 Sep 2013 15:09:56 +0200, Hrvoje Niksic <hrvoje.niksic@avl.com> a écrit :
It's not that obvious. It's not common to rely on that aspect of dict semantics, because you will usually lookup using the exact same type, not a compatible one. I would expect very little code, if any, to rely on this. (also, I would intuitively expect the latest key to be held, not the first one, since that's what happens for values.) Regards Antoine.

On 9/10/2013 10:54 AM, Antoine Pitrou wrote:
I intuitively expected, and I think most often would want, the first key to be held. The reason being that I would except most often the initial insertion to be the canonical form. Whichever way is adopted, I think even if you say it's implementation specific, people will end up relying on it in their code. Janzert

On 09/10/2013 11:54 AM, Janzert wrote:
My stock ticker example (but from a real problem I had a few days ago) would want to have this example: I want the first key, because I'm pre-seeding the dict with a variety of keys that are of the canonical form.
Whichever way is adopted, I think even if you say it's implementation specific, people will end up relying on it in their code.
Agreed. Eric.

On 10/09/2013 10:28am, Antoine Pitrou wrote:
I guess another example is creating an "identity dict" (see http://code.activestate.com/lists/python-ideas/7161/) by doing d = transformdict(id) -- Richard

Could a more generic variant of this class work? In the same way that `sorted` can accept a comparison function, similar could be done for a dictionary-like class: d = transformdict(key=str.lower) Strictly speaking, this would provide case-insensitive but not case-preserving behaviour. For any given use case, though, a function could instead be supplied to "normalise" the key (upper, lower, title case, etc) in a way that fits that case. I can't think of many real cases where multiple types of capitalisation would be useful within the same dictionary. Nigel On 10 September 2013 15:40, Richard Oudkerk <shibturn@gmail.com> wrote:

On Sep 10, 2013, at 4:28 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
In http://bugs.python.org/issue18986 I proposed adding a new mapping type to the collections module.
I would *really* like for this to start outside the standard library. It needs to mature with user feedback before being dumped in the collections module (which was never intended to be a giant pile of every collection a person could think of). Adding yet more dictionary variants is an example of way-too-many-ways-to-do-it. Raymond

Le Tue, 10 Sep 2013 21:40:36 -0500, Raymond Hettinger <raymond.hettinger@gmail.com> a écrit :
From a quick search: - case-insensitive dicts (use cases and implementation attempts): http://twistedmatrix.com/documents/current/api/twisted.python.util.Insensiti... https://mail.python.org/pipermail/python-list/2013-May/647243.html https://mail.python.org/pipermail/python-list/2005-April/296208.html https://mail.python.org/pipermail/python-list/2004-June/241748.html http://bugs.python.org/msg197376 http://stackoverflow.com/a/2082169 http://stackoverflow.com/a/3296782 http://code.activestate.com/recipes/66315-case-insensitive-dictionary/ https://gist.github.com/babakness/3901174 http://www.wikier.org/blog/key-insensitive-dictionary-in-python http://en.sharejs.com/python/14534 http://www.voidspace.org.uk/python/archive.shtml#caseless - identity dicts: https://mail.python.org/pipermail/python-ideas/2010-May/007235.html http://www.gossamer-threads.com/lists/python/python/209527 Python's own pickle module: http://hg.python.org/cpython/file/0e70bf1f32a3/Lib/pickle.py#l234
Well, thanks for the reminder, I was *indeed* going to dump all the collections I could think of in the collections module :-) (that would have been embarassing!) Seriously, I'm curious: what needs to mature, according to you? The proposed collection is a plain MutableMapping with the single addition of transforming the key on lookup. The use cases are well-defined and well-known. If you have any concrete questions or concerns then please offer them here.
Adding yet more dictionary variants is an example of way-too-many-ways-to-do-it.
So what is your proposal for what is definitely (see examples above, and this thread's and the tracker's responses) a very common need? Keep letting people write suboptimal, incomplete, buggy versions of the same thing? Thanks Antoine.

Seriously, I'm curious: what needs to mature, according to you?
In my mind, its availability on PyPI along with demonstrated use in the wild (plus corresponding votes to demonstrate that people use/like it) would help. That you can find several implementations at this doesn't mean it's necessarily worth adding to the std lib. Once in, it is very difficult to evict something that is later deemed not to have belonged in the std lib, so I think some extra scrutiny is worthwhile. Is there some obvious advantage to having a single API for this available to all Python applications? Are the semantics well-defined (do all the implementations you cited offer basically the same semantics)? The discussion so far here suggest that the best semantics might not be completely settled. (I still don't care for the name. "Transform" != "case folding" in my mind. A quick scan of your links suggests most people think something like "cidict" or "CaseInsensitiveDict" would be more descriptive.) Skip

On Wed, 11 Sep 2013 06:08:25 -0500, Skip Montanaro <skip@pobox.com> wrote:
The problem is that it is hard to get traction on pypi for a "small" feature like this. Most people will just implement a special purpose version that solves their specific need (possibly in a somewhat broken fashion) rather than add yet another dependency. Our approach in cases like this has (I think) been to learn from the existing implementations and build our own based on that (isn't that what happened with OrderedDict?)
I think the only question was what happens to the key on assignment, and that has been settled. Well, I guess there's also a question of how you create one, but that's not a question a pypi version would be any help in resolving: we'd still bikeshed it upon addition to the stdlib. In fact, I would not be surprised if *all* of the bikeshedding we are doing now (except this bit :) would take place if an existing pypi module for this were proposed for addition. I think the bigger question is composition of different dict types, and that is again something that isn't going to be solved by a pypi module.
Except it is wider than that: the transform function can be anything, not just case folding. I suggested surjectiondict or ontodict, but Antoine didn't like those :) (I had to look up the terms...it's been a long time since I studied math.) --David

On 11 September 2013 21:57, R. David Murray <rdmurray@bitdance.com> wrote:
I'll join the chorus requesting that this live on PyPI for a while first. I think this is a case similar to what happened with contextlib.ExitStack: I'm not sure if anyone actually *used* contextlib2 for anything significant (if they did, they didn't tell me about it), but just going through the process of properly documenting, publishing and testing it as a relatively independent project forced *me* to think through the design. Add in a couple of interesting conversation with Brandon Rhodes and others about the API design, and the end result was a *vast* improvement over what originally went up on PyPI. The barriers to making use of PyPI libraries are also falling with time, so a solid implementation may still see adoption, even if it's in the form of copy-and-paste programming rather than actual dependencies. I think there are also additional API decisions to be made beyond just the one about how to declare the mapping function, related to how to get the mapped key values *out*, as well as how to find out whether or not two potential key values map to the same actual key. As my preferred bikeshed colour, I'm going to suggest "MappedKeyDict" (using a similar naming style to OrderedDict), since the purpose of the container is to map the domain of the supplied keys to a different range before doing the value lookup. Suggested additional methods: md.map_key(key) # Applies the mapping function to the supplied key md.mapped_keys() # Like keys(), but with the key mapping function applied md.mapped_items() # Like items(), but with the key mapping function applied Another (more dubious) possible method: md.same_key(key1, key2) # "md.map_key(key1) == md.map_key(key2)" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Wed, 11 Sep 2013 22:47:12 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
ExitStack was quite a new thing, API-wise. The proposal here is to generalize something which already exists in various forms all over the Internet, and respecting a well-known API, MutableMapping. There are not many possible APIs to create case-insensitive dicts, or identity dicts. The only API contention until now has been about whether the first or last key would be retained (which settled on the former), and Serhiy's three instantiation schemes. The additional methods you suggest may be nice to have, but they are not necessary and (deliberately) not part of the original proposal. That is, they can be grafted later on.
As my preferred bikeshed colour, I'm going to suggest "MappedKeyDict" (using a similar naming style to OrderedDict),
Ha, another colour! Thanks :-) Regards Antoine.

Antoine Pitrou writes:
What Nick said was "I was too close to the design, and time and a very few good comments greatly improved the product." That's worth considering as generic advice rather than for the specifics of the case he faced compared to yours. Which modules in the stdlib would benefit from rewriting using "transformdict"? How about on PyPI?

Le Wed, 11 Sep 2013 22:50:08 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> a écrit :
"Time and comments" is why I'm posting a discussion thread here and waiting for responses. There are currently 34517 packages on PyPI. A new 34518th one won't magically get a lot of attention :-) (for example, I got very few comments when posting pathlib on PyPI - despite making several actual releases there -, compared to what I got from the PEP 428 discussion on python-ideas) Regards Antoine.

11.09.13 16:50, Stephen J. Turnbull написав(ла):
Which modules in the stdlib would benefit from rewriting using "transformdict"? How about on PyPI?
At least _threading_local, cProfile, doctest, json, and perhaps future implementations of __sizeof__ for some classes would benefit from rewriting using IdentityDict. Unfortunately copy and pickle expose their replacements of IdentityDict to public and can't change them. I don't known anything about general TransformDict use cases.

Am 11.09.13 15:04, schrieb Antoine Pitrou:
There are not many possible APIs to create case-insensitive dicts, or identity dicts.
That is certainly not true. Most obviously, you have the choice of a specialized case-mapping dict, or a generalized type that can be used for case mapping also. Does any of the referenced use cases actually use the API that you are proposing (i.e. with a key transformation function)? FWIW, I can think of yet another API for this: write a Proxy class class TransformedKey: # untested def __init__(self, original): self.original = original self.transformed = self.transform(original) def __hash__(self): return hash(self.transformed) def __eq__(self, other): return self.transformed == other.transformed def transform(self, key): raise NotImplementedError If there is then disciplined use in the form d[TransformedKey(key)] == value then you can use the existing dictionary type. Notice that this can do both case-insensitive and identity dicts (plus there are multiple choices of getting the transform function into it, as well as variations of the __eq__ implementation). There is evidence in this thread that people "grasp" case-insensitive more easily than the generalized API. For the record, I initially typed two responses into the tracker which I found to be incorrect before posting them, so I ended up posting neither. The "transformdict" type is not at all "natural", even though it may be useful. If you really want to "push" this API into 3.4, I think you will need to write a PEP, and find a PEP dictator who is willing to approve it. As you seem to dislike the idea of writing a PEP, I suggest to follow the idea of publishing it on PyPI now, and then proposing it for inclusion into 3.5. Regards, Martin

On Wed, 11 Sep 2013 19:31:56 +0200 "Martin v. Löwis" <martin@v.loewis.de> wrote:
Well, when you have a specialized need, you don't implement a generic version (*), so I don't think anybody implemented it ;-). The point of a generic API is to help replace all specialized implementations at once, rather than a small subset of them. (*) even the Twisted people don't seem to go that far
The problem is "disciplined use" here. This means any consumer of your API has to know about that quirk, so it's an inferior solution. (TransformedKey could be an internal detail of the implementation, as with weak dictionaries; but as a public API it doesn't seem very desirable)
The "transformdict" type is not at all "natural", even though it may be useful.
By that token, nothing is "natural" ;-) defaultdict isn't natural, yet it went in without a PEP. However, to all persons who already had that need and responded to the proposal, the generic version *does* seem natural enough, AFAICT. Only people who've never had such a need seem to fail to "grasp" it (and I only counted one such person, but my memory may be failing me). You know, I'm not against having "only" a case-insensitive dict. But that would be less useful all the while not being significantly easier to use, so I'm not really seeing the point.
What I dislike is the idea of doing additional work because some barriers are imposed ;-). PEP or PyPI are on a similar scale here. At least a PEP would help record the various discussion details, so I'd favour that over the PyPI path. Regards Antoine.

On Wed, Sep 11, 2013 at 10:47:12PM +1000, Nick Coghlan wrote:
I'll join the chorus requesting that this live on PyPI for a while first.
Another alternative would be ActiveState's Python recipes: http://code.activestate.com/recipes/langs/python which is a lot lighter weight than creating a package on PyPI. I believe that ChainMap, namedtuple, groupby, lru_cache and even math.fsum started life on ActiveState. -- Steven

On 09/11/2013 05:47 AM, Nick Coghlan wrote:
Personally, I would not add a PyPI dependency for a single object like transformdict. If we are that nervous about it we can make it provisional, but I don't see that as necessary.
I had the opposite experience. Going through PyDev to get Enum was *far* more educational, informative, and useful than creating an independent package. Plus, transformdict is not that new or different from existing dicts, and most decisions (which key to keep? how to initialize?) can be answered by simply being consistent with what we already have. -- ~Ethan~

On Wed, Sep 11, 2013 at 06:08:25AM -0500, Skip Montanaro wrote:
But the proposal is not for a case-insensitive dict. It is more general than that, with case-insensitivity just one specific use-case for such a transformative dict. Arguably the most natural, or at least obvious, such transformation, but there are others. I have code that does something like this: MAPPING = {'spam': 23, 'ham': 42, 'eggs': 17} result = MAPPING[key.strip()] # later... answer = MAPPING[key] # Oops, forgot to strip! This is broken. Using Antoine's proposal: MAPPING = TransformDict(str.strip) MAPPING.update({'spam': 23, 'ham': 42, 'eggs': 17}) result = MAPPING[key] # later... answer = MAPPING[key] # Fine now. so that the mapping handles stripping the keys, not the caller. This isn't just about strings, and certainly not just case-insensitivity. -- Steven

2013/9/11 Steven D'Aprano <steve@pearwood.info>:
For this use case, you should not keep the key unchanged, but it's better to store the stripped key (so MAPPING.keys() gives you the expected result). The transformdict type proposed by Antoine cannot be used for this use case. The os.environ mapping uses a subprocess of MutableMapping which accepts 4 functions: encoder/decoder for the key and encoder/decoder for the value. Such type is even more generic. transformdict cannot replace os._Environ. (Or do you prefer to store the ' eggs' key?) Victor

On 09/11/2013 06:58 AM, Victor Stinner wrote:
He isn't keeping the key unchanged (notice no white space in MAPPING), he's merely providing a function that will automatically strip the whitespace from key lookups.
The transformdict type proposed by Antoine cannot be used for this use case.
Yes, it can. --> from collections import transformdict --> MAPPING = transformdict(str.strip) --> MAPPING.update({'spam': 23, 'ham': 42, 'eggs': 17}) --> MAPPING transformdict(<method 'strip' of 'str' objects>, {'ham': 42, 'spam': 23, 'eggs': 17}) --> MAPPING[' eggs '] 17 -- ~Ethan~

2013/9/11 Ethan Furman <ethan@stoneleaf.us>:
transformdict keeps the key unchanged, see the first message:
'Foo' is stored as 'Foo', not as 'foo'. So for stripped keys: d=transformdict(str.strip); d[' abc ']; print(list(d)) should print "[' abc ']", not "['abc']". Is it the expected result? Victor

On 09/11/2013 08:49 AM, Victor Stinner wrote:
And indeed it does: Python 3.4.0a1+ (default:833246d42825+, Aug 31 2013, 14:17:59) [GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information. --> from collections import transformdict --> d=transformdict(str.strip); d[' abc '] = 42; print(list(d)) [' abc ']
Is it the expected result?
Yup! :) -- ~Ethan~

On 12 September 2013 02:03, Ethan Furman <ethan@stoneleaf.us> wrote:
That seems backwards to me. I would think that retrieving the keys from the dict would return the transformed keys (I'd call them canonical keys). That way there's no question about which key is stored - it's *always* the transformed key. In fact, I think this might get more traction if it were referred to as a canonicalising dictionary (bikeshedding, I know). Tim Delaney

On 09/11/2013 02:39 PM, Tim Delaney wrote:
At this point there is still no question: it's the first version of the key seen. For a stupid example: --> d = transformdict(str.lower) --> d['ThePyramid'] = 'Game Show' --> d['AtOnce'] = now() --> for k, v in d.items(): ... print(k, v) Imagine writing a function to get that capitalization right.
In fact, I think this might get more traction if it were referred to as a canonicalising dictionary (bikeshedding, I know).
Whoa, that's way harder to spell! ;) Drop the 'ising', though, and I'm in. -- ~Ethan~

On 09/11/2013 02:39 PM, Tim Delaney wrote:
I would think that retrieving the keys from the dict would return the transformed keys (I'd call them canonical keys).
The more I think about this the more I agree. A canonicaldict with a key function that simply stored the transformed key and it's value would seem to be a lot simpler: - no need to store a separate "presentation" key - no confusion about which of the first key/last key seen is stored - no mistakes with the "first" key not being added before real data and getting the presentation key wrong Further, in order to store the non-canonical keys a separate list must be kept of the keys to preseed the canonicaldict; if we store the canonical keys a separate list must be kept for presentation purposes -- so worst case scenario we're keeping the same amount of information and best-case scenario the presentation of the keys doesn't matter and we just saved ourselves an extra data structure. -- ~Ethan~

Le Thu, 12 Sep 2013 07:08:47 -0700, Ethan Furman <ethan@stoneleaf.us> a écrit :
And it wouldn't solve the use cases. What's the point?
Further, in order to store the non-canonical keys a separate list must be kept of the keys to preseed the canonicaldict;
Yeah, so this is totally silly. What you're basically saying is "we don't need TransformDict since people can re-implement it themselves". Regards Antoine.

On 09/12/2013 07:43 AM, Antoine Pitrou wrote:
Yeah, so this is totally silly. What you're basically saying is "we don't need TransformDict since people can re-implement it themselves".
No, what I'm saying is that the "case-preserving" aspect of transformdict is silly. The main point of transformdict is to enable, for example, 'IBM', 'Ibm', and 'ibm' to all match up as the same key. But why? Because you don't trust the user data. And if you don't trust the user data you have to add the correct version of the key yourself before you ever process that data, which means you already have the correct version stored somewhere. -- ~Ethan~

Le Thu, 12 Sep 2013 08:05:44 -0700, Ethan Furman <ethan@stoneleaf.us> a écrit :
That's assuming there is an a priori "correct" version. But there might not be any. Keeping the original key is important for different reasons depending on the use case: - for case-insensitive dicts, you want to keep the original key for presentation, logging and debugging purposes (*) - for identity dicts, the original key is mandatory because the id() value in itself is completely useless, it's just used for matching (*) For a well-known example of such behaviour, think about Windows filesystems. Regards Antoine.

On 13 September 2013 01:40, Antoine Pitrou <solipsis@pitrou.net> wrote:
In this case though, there are two pieces of information: 1. A canonical key (which may or may not equal the original key); 2. The original key. It seems to me then that TransformDict is a specialised case of CanonicalDict, where the canonical key is defined to be the first key inserted. It would in fact be possible (though inefficient) to implement that using a canonicalising callable that maintained state - something like (untested): class OriginalKeys: def __init__(self):: self.keys = CanonicalDict(str.lower) def __call__(self, key): return self.keys.setdefault(key, key) class OriginalKeyDict(CanonicalDict): def __init__(self):: super().__init__(OriginalKeys()) Tim Delaney

On 13 September 2013 07:29, Tim Delaney <timothy.c.delaney@gmail.com> wrote:
Bah - got myself mixed up with original key and case preserving there ... try this: class OriginalKeys: def __init__(self, func):: self.keys = CanonicalDict(func) def __call__(self, key): return self.keys.setdefault(key, key) class OriginalKeyDict(CanonicalDict): def __init__(self, func):: super().__init__(OriginalKeys(func)) class IdentityDict(OriginalKeyDict): def __init__(self): super().__init__(id) class CasePreservingDict(OriginalKeyDict): def __init__(self): super().__init__(str.lower) Tim Delaney

On Thu, 12 Sep 2013 08:05:44 -0700, Ethan Furman <ethan@stoneleaf.us> wrote:
No, in the *(original use case* (and many other use cases, such as the identity dict) we *do* trust the user data. It is more trustworthy than the key forms used to look up the data inside the program. That's the whole point. (Don't be distracted by Eric's particular use case.) But even for the general case...well, let me repost here what I said on the tracker: In our view, this data structure is for cases where the original key is the most important piece of information (about the keys). The transformation in the lookup process is entirely in the service of looking up the value paired with that original key when there is more than one possible representation of that key. It is the original key that is critical when re-serializing the data or otherwise making use of the keys for anything other than lookup. So this is about making the data structure succinctly model the problem domain, which is what OO is supposed to be good at :) --David

On 09/11/2013 06:58 AM, Victor Stinner wrote:
True, it's more generic -- but is it used more often? Or often enough to have those four functions be part of transformdict instead of just the one encodekey function? (I mean the idea of os._Environ, not that specific implementation.) Personally, I wouldn't mind having all four; for one thing, the name 'transformdict' would then be entirely appropriate. ;) -- ~Ethan~

Le Wed, 11 Sep 2013 07:48:56 -0700, Ethan Furman <ethan@stoneleaf.us> a écrit :
The key decoder function is quite useless since the original key is retained. As for the value encoder/decoder, I don't really see the point: just store whichever value you want to retrieve later. The main point of transformdict is that the *lookup* is done using a different key than the user-provided key. Regards Antoine.

There is a question about specifying the transform function. There are three ways to do this: 1. Positional argument of the constructor. d = TransformDict(str.casefold, Foo=5) 2. Subclassing. class CaseInsensitiveDict(TransformDict): def transform(self, key): return key.casefold() d = CaseInsensitiveDict(Foo=5) 3. Type generator. d = TransformDict(str.casefold)(Foo=5) Second method looks a little simpler to me from implementation side (What if you call __init__ for already initialized object? What if you used the object before calling __init__?). Third method allows you to customize other aspects of dict behavior (combine OrderedDict, defaultdict,..). First method looks less cumbersome from user side at first look. But how many different transform functions you use? Aren't they deserve named classes?

Le Wed, 11 Sep 2013 12:38:13 +0300, Serhiy Storchaka <storchaka@gmail.com> a écrit :
I thought about this first, and then I remembered that python-dev isn't generally very keen on subclassing-based APIs :-)
3. Type generator.
d = TransformDict(str.casefold)(Foo=5)
[...]
Third method allows you to customize other aspects of dict behavior (combine OrderedDict, defaultdict,..).
Well, no, it's not that easy. Especially since OrderedDict and defaultdict weren't written with combination in mind. Regards Antoine.

11.09.13 12:52, Antoine Pitrou написав(ла):
Why? This way looks most natural to me.
They can be rewritten. Actually the defaultdict is just simple wrapper around existing functionality of the __missing__ method. We can add the __transform__ method directly in the dict class. I think it will significant (2-3x) decrease a size of needed C code (no need in defining new type with constructors/destructors/etc, only lookup_transform).

Le Wed, 11 Sep 2013 13:37:53 +0300, Serhiy Storchaka <storchaka@gmail.com> a écrit :
Right now I am not proposing any C implementation of TransformDict. That could be added, but it's not what I'm pushing for here :-) But I don't think the "type generator" API would be easier to implement in C, anyway. Regards Antoine.

On 09/11/2013 02:38 AM, Serhiy Storchaka wrote:
This method follows the precedent of defaultdict: --> from collections import defaultdict --> l = defaultdict(list, foo=['ham','eggs']) --> l defaultdict(<class 'list'>, {'foo': ['ham', 'eggs']}) Without a compelling reason to change, we should keep it consistent. -- ~Ethan~
participants (29)
-
"Martin v. Löwis"
-
Andrea Corbellini
-
Antoine Pitrou
-
Armin Rigo
-
Barry Warsaw
-
Dirkjan Ochtman
-
Eli Bendersky
-
Eric V. Smith
-
Ethan Furman
-
Glenn Linderman
-
Hrvoje Niksic
-
Janzert
-
Joao S. O. Bueno
-
Lukas Lueg
-
MRAB
-
Nick Coghlan
-
Nigel Small
-
Oscar Benjamin
-
Paul Moore
-
Piotr Duda
-
R. David Murray
-
Raymond Hettinger
-
Richard Oudkerk
-
Serhiy Storchaka
-
Skip Montanaro
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tim Delaney
-
Victor Stinner