bags? 2.5.x?
Arnaud Delobelle
arnodel at googlemail.com
Sat Feb 2 09:40:01 CET 2008
On Feb 1, 10:29 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Dan Stromberg <dstrombergli... at gmail.com> writes:
> > > * Is the feature useful for the broad mass?
>
> > Yes, probably, at least if this kind of feature's inclusion in other
> > languages and my two recent needs for it are any indication. In other
> > languages, they are sometimes called bags or multisets.
>
> I use defaultdict(int) all the time, but it's been fairly rare
> to want to do set operations (union, intersection, difference) on
> these. I'm not even sure what the intersection should mean.
Let me write the multiset with 3 x's and 2y y' as {2x, 3y} (this is
not a suggested Python syntax!).
Note: in the following my multisets can have negative multiplicities,
whereas the usual definition only accepts positive multiplicities.
Let H be the collection of all python hashable objects.
Let Z be the set of integers.
So a multiset is a mapping from H to Z which maps cofinitely many
objects
to 0 (a sort of defaultdict(int))
=== For mathematically minded people ===
Well, if one thinks about the mathematical meaning of multisets, then
the collection of all multisets is the free Z-module over H.
Therefore the meaningful operations for multisets are addition,
subtraction and multiplication by a scalar.
=== For not so mathematically minded people ===
One way of thinking about a multiset can be as a vector whose
dimensions are explicitly named, instead of being implicit. For
example in 3D the vector [2, 3, 4] could instead be written as {2x,
3y, 4z}. Just like vectors, multisets can then be added, subtracted
and multiplied by as scalar:
{2x, 3y} + {4x, 7z} = {6x, 3y, 7z}
{4x, 2y, 1z} - {1x, 2y, 3z} = {3x, -2z}
{3x, 2y}*5 = {15x, 10y}
Now if one wants to extend union and intersection to multisets:
* For sets {x, y} union {y, z} = {x, y, z}. The natural way of
extending this to multisets is having the union operator take the
max of the multiplicities of each element, i.e.
{2x, 7y, 1z} union {3y, 1z, 5t} = {2x, 7y, 1z, 5t}
{3x, -5y} union {2z} = {3x, 2z} (!)
* Similarly, for intersection one would take the min of
multiplicities, i.e.
{2x, 7y, 1z} intersection {3y, 1z, 5t} = {3y, 1z}
{3x, -5y} intersection {2z} = {-5y} (!)
* difference and symmetric_difference can be defined in terms of +, -,
union, intersection:
A difference B = A - (A intersection B)
A symmetric_difference B = (A union B) - (A intersection B)
Note that union, intersection, difference and symmetric_difference
defined this way are fully compatible with the corresponding set
operations in the following sense:
If A and B are sets, then
- multiset(A).union(multiset(B)) == multiset(A.union(B))
- multiset(A).intersection(multiset(B)) == multiset(A.intersection(B))
(same for difference, symmetric_difference)
I think this is how I would like my multisets :)
--
Arnaud
More information about the Python-list
mailing list