[Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)

Mon, 25 Nov 2002 09:14:30 +0100

aymond Hettinger wrote:

> Here's a write-up for a proposed dictionary update method.
> Comments are invited while I work on a patch, docs, tests, etc.

Just as data point: you may want to have a look at the
setdict() API in mxTools. It does pretty much what you
are suggesting.

If you want to add this functionality as method, I'd also
suggest to add invdict():

setdict(sequence,value=None)
     Constructs a dictionary from the given sequence. The sequence must contain hashable objects which are used as keys. 
The values are all set to value. Multiple keys are silently ignored. The function comes in handy whenever you need to 
work with a sequence in a set based context (e.g. to determine the set of used values).

invdict(dictionary)
     Constructs a new dictionary from the given one with inverted mappings. Keys become values and vice versa. Note that 
no exception is raised if the values are not unique. The result is undefined in this case (there is a value:key entry, 
but it is not defined which key gets used).

Other useful extensions would be extract() and iremove() as method
on lists:

extract(object,indices[,defaults])
     Builds a list with entries object[index] for each index in the sequence indices.

     If a lookup fails and the sequence defaults is given, then defaults[nth_index] is used, where nth_index is the 
index of index in indices (confused ? it works as expected !). defaults should have the same length as indices.

     If you need the indices as well, try the irange function. The function raises an IndexError in case it can't find 
an entry in indices or defaults.

iremove(object,indices)
     Removes the items indexed by indices from object.

     This changes the object in place and thus is only possible for mutable types.

     For sequences the index list must be sorted ascending; an IndexError will be raised otherwise (and the object left 
in an undefined state).

(Note that the above work on all kinds of sequences/mappings, not just
lists or dictionaries.)

>
>
> Raymond Hettinger
>
>
> ------------------------------------------------------------------------
> METHOD SPECIFICATION
>
>     def sequpdate(self, iterable, value=True):
>         for elem in iterable:
>             self[elem] = value
>         return self
>
>
> USE CASES
>
> # Fast Membership testing
> termwords = dict('End Quit Stop Abort'.split())
>  . . .
> if lexeme in termwords: sys.exit(0)
>
> # Removing duplicates
> seq = dict().sequpdate(seq).keys()
>
> # Initializing or resetting mapped accumulators
> absences = dict('Tom Dick Harry'.split(), 0)
> for name, date in classlog:
>     absences[name] += 1   # Intentionally raises KeyError for invalid names
>
>
> RATIONALE
>
> Examining code in the library, the cookbook, and the vaults of
> parnassus, the most common practice for membership testing is
> a sequential search of a list, tuple, or string.  Where the
> writers were sophisticated and cared about performance, they
> used in-line versions of the python fragment shown above.  I have
> not found a single case where people took the time to "roll their
> own" pure python function as shown above.  Even if they had, the
> construction time would take twice as long as with a C coded method.
> IOW, practices are tending toward slow or verbose approaches in the
> absence of a quick, convenient, standardized tool for moving sequences
> into dictionaries.
>
> The problem of removing duplicates from a sequence is most frequently
> solved with inline use of a code fragment similar to the one shown
> above.  Again, you almost never see people taking the time to build
> their own uniq function, factoring all the effort into a single,
> tested, reusable code fragment.  The exception is Tim's wonderful,
> general purpose recipe for uniquifying anything -- unfortunately, I
> don't think you ever see his code used in practice.
>
> Also, since all containers support __iter__, they can be fed to list();
> however, in the absence of the above method, they cannot be fed to dict()
> without shenanigans for building item lists.  The proposed method makes
> todict() as doable as tolist() and encourages switching to appropriate
> data structures.
>
>
> OBJECTIONS TO THE PRIOR IDEA OF EXPANDING THE DICT CONSTRUCTOR (SF 575224)
>
> 1. The constructor was already overloaded.
> 2. Weird syntax was needed to fit with other constructor syntax.
> 3. The sets module was going to meet all needs.
>
> The first two comments led to this revision in method form.
>
> The Sets module met several needs centering around set mathematics;
> however, for membership testing, it is so slow that it is almost
> always preferable to use dictionaries instead (even without this
> proposed method).  The slowness is intrinsic because of the time
> to search for the __contains__ method in the class and the time
> to setup a try/except to handle mutable elements.  Another reason
> to prefer dictionaries is that there is one less thing to import
> and expect readers to understand.  My experiences applying the
> Sets module indicates that it will *never* replace dictionaries for
> membership testing and will have only infrequent use for uniquification.
>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/