Handling 2.7 and 3.0 Versions of Dict

Martin v. Loewis martin at v.loewis.de
Wed Aug 31 11:55:39 CEST 2011


Am 31.08.2011 03:43, schrieb Travis Parks:
> I am writing a simple algorithms library that I want to work for both
> Python 2.7 and 3.x. I am writing some functions like distinct, which
> work with dictionaries under the hood. The problem I ran into is that
> I am calling itervalues or values depending on which version of the
> language I am working in. Here is the code I wrote to overcome it:
> 
> import sys
> def getDictValuesFoo():
>     if sys.version_info < (3,):
>         return dict.itervalues
>     else:
>         return dict.values
> 
> getValues = getDictValuesFoo()
> 
> def distinct(iterable, keySelector = (lambda x: x)):
>     lookup = {}
>     for item in iterable:
>         key = keySelector(item)
>         if key not in lookup:
>             lookup[key] = item
>     return getValues(lookup)
> 
> I was surprised to learn that getValues CANNOT be called as if it were
> a member of dict. I figured it was more efficient to determine what
> getValues was once rather than every time it was needed.
> 
> First, how can I make the method getValues "private" _and_ so it only
> gets evaluated once?

Not sure what "private" means here. Having the logic selected only once
goes like this

if sys.version_info < (3,):
  def getDictValues(dict):
      return dict.itervalues()
else:
  def getDictValues(dict):
      return dict.values()

> Secondly, will the body f the distinct method be
> evaluated immediately?

Yes.

> How can I delay building the dict until the first value is requested?

Make it a generator:

def distinct(iterable, keySelector = (lambda x: x)):
    lookup = {}
    for item in iterable:
        key = keySelector(item)
        if key not in lookup:
            lookup[key] = item
    for v in  getValues(lookup):
        yield v

This delays *building* the dictionary until the *first* value is
requested. I.e. it completes building the dictionary before the first
value is returned.

If you also want to interleave iteration over iterable with fetching
distinct values, write it like that:

def distinct(iterable, keySelector = (lambda x: x)):
    seen = {}
    for item in iterable:
        key = keySelector(item)
        if key not in seen:
            yield item
            seen[key] = item

> I noticed that hashing is a lot different in Python than it is in .NET
> languages. .NET supports custom "equality comparers" that can override
> a type's Equals and GetHashCode functions. This is nice when you can't
> change the class you are hashing. That is why I am using a key
> selector in my code, here. Is there a better way of overriding the
> default hashing of a type without actually modifying its definition? I
> figured a requesting a key was the easiest way.

You could provide a Key class that takes a hash function and a value
function:

class Key:
  def __init__(self, value, hash, eq):
    self.value, self.hash, self.eq = value, hash, eq
  def __hash__(self):
    return self.hash(self.value)
  def __eq__(self, other_key):
    return self.eq(self.value, other_key.value)

This class would then be used instead of your keySelector.

With that, you could change the dictionary to a set. Actually, you
could already do so in the second generator version:

def distinct(iterable, keySelector = (lambda x: x)):
    seen = set()
    for item in iterable:
        key = keySelector(item)
        if key not in seen:
            yield item
            seen.add(key) # item is not needed anymore

HTH,
Martin



More information about the Python-list mailing list