bags? 2.5.x?

Arnaud Delobelle arnodel at googlemail.com
Sat Feb 2 03:40:01 EST 2008


On Feb 1, 10:29 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Dan Stromberg <dstrombergli... at gmail.com> writes:
> > > * Is the feature useful for the broad mass?
>
> > Yes, probably, at least if this kind of feature's inclusion in other
> > languages and my two recent needs for it are any indication.  In other
> > languages, they are sometimes called bags or multisets.
>
> I use defaultdict(int) all the time, but it's been fairly rare
> to want to do set operations (union, intersection, difference) on
> these.  I'm not even sure what the intersection should mean.

Let me write the multiset with 3 x's and 2y y' as {2x, 3y} (this is
not a suggested Python syntax!).

Note: in the following my multisets can have negative multiplicities,
whereas the usual definition only accepts positive multiplicities.

Let H be the collection of all python hashable objects.
Let Z be the set of integers.

So a multiset is a mapping from H to Z which maps cofinitely many
objects
to 0 (a sort of defaultdict(int))

=== For mathematically minded people ===

Well, if one thinks about the mathematical meaning of multisets, then
the collection of all multisets is the free Z-module over H.
Therefore the meaningful operations for multisets are addition,
subtraction and multiplication by a scalar.

=== For not so mathematically minded people ===

One way of thinking about a multiset can be as a vector whose
dimensions are explicitly named, instead of being implicit.  For
example in 3D the vector [2, 3, 4] could instead be written as {2x,
3y, 4z}.  Just like vectors, multisets can then be added, subtracted
and multiplied by as scalar:


{2x, 3y} + {4x, 7z} = {6x, 3y, 7z}
{4x, 2y, 1z} - {1x, 2y, 3z} = {3x, -2z}
{3x, 2y}*5 = {15x, 10y}

Now if one wants to extend union and intersection to multisets:

* For sets {x, y} union {y, z} = {x, y, z}.  The natural way of
  extending this to multisets is having the union operator take the
  max of the multiplicities of each element, i.e.

{2x, 7y, 1z} union {3y, 1z, 5t} = {2x, 7y, 1z, 5t}
{3x, -5y} union {2z} = {3x, 2z} (!)

* Similarly, for intersection one would take the min of
multiplicities, i.e.

{2x, 7y, 1z} intersection {3y, 1z, 5t} = {3y, 1z}
{3x, -5y} intersection {2z} = {-5y} (!)

* difference and symmetric_difference can be defined in terms of +, -,
  union, intersection:

A difference B = A - (A intersection B)
A symmetric_difference B = (A union B) - (A intersection B)

Note that union, intersection, difference and symmetric_difference
defined this way are fully compatible with the corresponding set
operations in the following sense:

If A and B are sets, then

- multiset(A).union(multiset(B)) == multiset(A.union(B))

- multiset(A).intersection(multiset(B)) == multiset(A.intersection(B))

(same for difference, symmetric_difference)

I think this is how I would like my multisets :)

--
Arnaud




More information about the Python-list mailing list