python bijection

Raymond Hettinger python at rcn.com
Sat Nov 21 21:22:30 EST 2009


On Nov 19, 3:24 pm, Joshua Bronson <jabron... at gmail.com> wrote:
> I couldn't find a library providing a bijective map data structure
> (allowing for constant-time lookups by value) in the few minutes I
> looked, so I took a few more minutes to code one up:http://bitbucket.org/jab/toys/src/tip/bijection.py
>
> Is this at all worth releasing? Comments and suggestions welcome.
>
> Josh

Hello Joshua,

I have a few design ideas and comments for you.

* The idea of using __call__ for looking-up inverse values was
inspired.  That is useable, clean, and easy to remember; however, as
discussed below, there are issues though with its actual use in real
code.

* Am not excited by the inverse iterators.  With just a regular
mapping you can write:

        for a, b in m.items() ...   # consider either a or b be the
key and the other to be the value

  That meets all of the needs that would have been served by
iter_inverse_keys() or iter_inverse_values() or whatnot.  The mirrored
API doesn't really provide much in the way of value added.

* After exercising the API on a couple of samples, I'm worried that
almost any inverse-method based API makes it difficult to review code
while keeping straight the intended meaning of the forward and inverse
relationships.  Am thinking that it is simpler, faster, and clearer to
just use two dictionaries -- that approach lets the variable names
communicate the important info.  For example, the following code helps
keep the coder and code reviewer from conflating the forward and
inverse directions:

       myurl = ip2url[myip]
       myip = url2ip[myurl]

Contrast that with:

       myurl = site_bijection[myip]
       myip = site_bijection(myurl)

With the latter, it is darned difficult to detect accidental
conflation of brackets with parentheses.

* So, I'm thinking that code needing a bijection would be better-off
with two ordinary dicts, perhaps augmented by a couple of convenience
functions:

        biject_add(site_bijection, ip=myip, url=myurl)    # Create a
new pairing, raise ValueError if either key
                                                          # maps to
more than one value (in violation of the
                                                          # bijection
invariant: one-to-one and onto)

        biject_remove(ip=myip)                            # Clear an
entry from both dicts
                or
        biject_remove(url=myurl)

Alternatively, another possible approach is to used use the class
generation approach (such as that used by named_tuple()) to generate a
custom bijection class with either attribute based or keyworded
accessors:

Attribute based accessors:

      site = Bijection('ip', 'url')
      site.url[myip] = myurl

      for ip, url in site.items() ...
      print site.ip[myurl]
      myurl = site.url.pop(myip)

Keyword accessors:

      site = Bijection('ip', 'url')
      site.set(ip=myip, url=myurl)
      myurl = site.get(ip=myip)
      myip = set.get(url=myurl)
      myurl = site.pop(ip=myip)
      site.del(ip=myip)
      site.del(url=myurl)


Hope these ideas help.  The ultimate success of the Bijection code
will depend on its clarity, simplicity, and speed.  Experiment with
various approaches to find-out which looks the best in real code.  It
cannot be error-prone or it is doomed.  Also, it should not introduce
much overhead processing or else people will avoid it.  The API should
be trivially simple so that people remember how to use it months after
seeing it for the first time.

Good luck and happy hunting,



Raymond



More information about the Python-list mailing list