dictionary help

Tue Aug 11 12:10:10 EDT 2009

On Aug 11, 11:51 am, MRAB <pyt... at mrabarnett.plus.com> wrote:
> Krishna Pacifici wrote:
> > Thanks for the help.
>
> > Actually this is part of a much larger project, but I have unfortunately
> > pigeon-holed myself into needing to do these things without a whole lot
> > of flexibility.
>
> > To give a specific example I have the following dictionary where I need
> > to remove values that are duplicated with other values and remove values
> > that are duplicates of the keys, but still retain it as a dictionary.  
> > Each value is itself a class with many attributes that I need to call
> > later on in the program, but I cannot have duplicates because it would
> > mess up some estimation part of my model.
>
> > d =
> > {36: [35, 37, 26, 46], 75: [74, 76, 65, 85], 21: [20, 22, 11, 31], 22:
> > [21, 23, 12, 32], 26: [25, 27, 16, 36], 30: [20, 31, 40]}
>
> > So I want a new dictionary that would get rid of the duplicate values of
> > 21, 22, 36 and 20 and give me back a dictionary that looked like this:
>
> > new_d=
> > {36: [35, 37, 26, 46], 75: [74, 76, 65, 85], 21: [20, 11, 31], 22: [23,
> > 12, 32], 26: [25, 27, 16], 30: [40]}
>
> > I understand that a dictionary may not be the best approach, but like I
> > said I have sort of pigeon-holed myself by the way that I am simulating
> > my data and the estimation model that I am using.  Any suggestions or
> > comments about the above problem would be greatly appreciated.
>
>  >>> d = {36: [35, 37, 26, 46], 75: [74, 76, 65, 85], 21: [20, 22, 11,
> 31], 22: [21, 23, 12, 32], 26: [25, 27, 16, 36], 30: [20, 31, 40]}
>  >>> new_d = {}
>  >>> seen = set(d.keys())
>  >>> for k, v in d.items():
> ...     new_d[k] = [x for x in v if x not in seen]
> ...     seen |= set(new_d[k])
> ...
>  >>> new_d
> {36: [35, 37, 46], 75: [74, 76, 65, 85], 21: [20, 11, 31], 22: [23, 12,
> 32], 26: [25, 27, 16], 30: [40]}

Ha ha, MRAB beat me to it:

d = {
    36: [35, 37, 26, 46],
    75: [74, 76, 65, 85],
    21: [20, 22, 11, 31],
    22: [21, 23, 12, 32],
    26: [25, 27, 16, 36],
    30: [20, 31, 40],
    }

new_d = { # Given, and apparently incorrect.
    36: [35, 37, 26, 46], # 26 is a key and should be gone.
    75: [74, 76, 65, 85],
    21: [20, 11, 31],
    22: [23, 12, 32],
    26: [25, 27, 16],
    30: [40],
    }

expected = {
    36: [35, 37, 46],
    75: [74, 76, 65, 85],
    21: [20, 11, 31],
    22: [23, 12, 32],
    26: [25, 27, 16],
    30: [40],
    }

def removeDuplicates(D):
    '''
    Remove values that are duplicated with other values
    and remove values that are duplicates of the keys.

    Assumes that values in the lists are already unique within
    each list.  I.e. duplicates are only in the keys or in other
    lists.

    This function works "in place" on D, so it doesn't return
    anything.  Caller must keep a reference to D.
    '''

    seen = set(D) # Get a set of the keys.

    for key, values_list in D.iteritems():

        # Filter out values that have already been seen.
        filtered_values = [
            value
            for value in values_list
            if not value in seen
            ]

        # Remember newly seen values.
        seen.update(filtered_values)

        D[key] = filtered_values

## Example:
##
##    >>> d == expected
##    False
##    >>> removeDuplicates(d)
##    >>> d == expected
##    True