[Python-ideas] matrix operations on dict :)

Tue Feb 7 02:00:24 CET 2012

2012/2/6 Simon Sapin <simon.sapin at kozea.fr>:
> Le 06/02/2012 21:01, julien tayon a écrit :
>
>> Proposing vector operations on dict, and acknowledging there was an
>> homeomorphism from rooted n-ary trees to dict, was inducing the
>> possibility of making matrix of dict / trees.
>
>
> Hi,
>
> I studied linear algebra and I think I understand it fairly well. However,
> after reading your email and the linked documentation, I’m just confused. I
> really don’t know what this is about.
>
Because there is some dust under the carpet.

Lets define the notion of dict(dict()) (rooted k-ary trees) as a
vector. Imagine :
tree A
{ a:
    { b : 1,
      c : 2
    },
  e : 3.0
}

This is the same as
vector B
dict(
 tuple([ 'a', 'b' ]) = 1,
 tuple([ 'a', 'c' ]) = 2,
 tuple([ e ]) = 3.0
)
By collapsing the path of intricated dict to a single key (made of the
ordered list of keys to the value)  you always fall back on  a dict of
depth 1.

I can construct A from B and B from A without any loose of properties.
Thus it is equivalent.

(Sparse) Matrix are therefore build this way
dict( tuple( source_tuple, destination_tuple ) = function )
(I could not resolve myself to code stupid matrices and  that have
only magnitude instead of function, therefore my matrices are not the
one of linear algebrae)

so any dict of dict is the same as a  one depth  dict.
A path to a value  defines a dimension, each paths are considered orthogonal.

So further reasoning will be made on vectors / 1 depth level dict.

There are two problems :
1) we can generate an infinite number of key so these are implicitly
incomplete vectors on an infinite base
it is as if when you define :
dict( x = 1) you also mean dict( x = 1, y = null element for
multiplication and neutral for addition , dn = null element for
multiplication and neutral for addtion   .... )
and I hear the tao of python saying explicit is better than implicit.
But a dict is explicitly an incomplete vector on an infinite base.

2) The problem is in the algebrae of the values/leaves because the
implementation is made by delegating addition all the way up to the
value.

Normaly, we think of values of a vector as scalars/magnitude
(float/int for instance), and addition should have  no more
consistency as the consistency of all the additions in a dict for all
the values. Unless you need something else.

I intended to be a little more consistent and to enforce the fact that
no values should have additions properties different than the
properties of a ring.

However, I found no easy way to do this. I would need all the classes
to tell wich algebrae they support by the mean of a property  that
would tell if
Class A  + Class B is commutative/associative/distributive. Still I
can do interesting thing, so I don't really want to castrate the
beast. (Plus, it would mean writing a PEP proposal, which is beyond my
abilities)

> I *think* that you are defining something like a mathematical group[1] or
> ring[2], but:

As far as I am concerned I am pretty confident in having all the
properties of the ring with + & * on dict. And I think I do.
Tell me if I need other tests here :
https://github.com/jul/ADictAdd_iction/blob/master/vector_dict/ConsistentAlgebrae.py#L169
or if I misunderstood any properties. I may have a good intuition, I
may recognize things when I see them, but I am clumsy with words.

>
> * Over what elements? (Any dict? dicts with some property?)

Well, you achieve better algebraic properties if leaves belongs to a
ring at least (float, numpy.array). But I have funny results with
records too.

Since dimension can be considered independant as long as values have
the same type all goes well as long as you do operations supported by
the leaves.
And  if you know what you are doing when mixing up dimension, all goes well.
for instance if with a matrix you multiply a nump.array with a weight
(float) for instance, it has sense.
if you multiply a string by a string, well, you go into trouble, and I
cowardly let the excpetion raise.

For distance of vectors I also run in trouble when dealing with
records like algebrae for + and *  ( list and string ) I see an
elegant workaround wich would be to know what is distance(record1 -
record2) even though I ignore what record1 - record2 is. For instance
with two string or ordered list, the distance can be defined as
edition distance even though string1 - string2 is a nonsense. But I
dont know yet how to make it fit in the puzzle.
Because sometimes norm( A - B ) has more sense than A - B I may need
to refactor the norm method. At this point I may also need
cooperations from the other classes :)

I may admit, I have not thought of everything and there can be some
holes in the racket, but, it is promising, and I had much fun using it
so far.

> * How exactly are your "addition" and "multiplication" (if any) defined?
by doing the following rules :

Addition :
given two vectors (as prior defined), we make the assumption that
these are vectors on an infinit base (made of all the possible paths),
and that when two vectors are added there are two possibles cases :
* if keys exists in both dict : add leaf
* if key in one dict only : create a leaf in the resulting dict with
the value (therefore assuming that undefined keys are neutral to
addition)

For multiplication I just accecpt either dict multiplication (non
existent path default value being the null element)  or
scalar/magnitude multiplication

* if you do 2  * dict( x =  val, y = val2 ) it will do
dict( x =  2 *val, y = 2 *val2 )

if you do dict( x =  val, y = val2 ) * 2  it will do
dict( x =  val * 2, y = val2 * 2 )
(as with vectors)

* if you do dict1 * dict2
any non common keys being implicitly the zero element of
multiplication  it will be
  * for each common keys thre resulting value is  the product of the
leaves of this key.
  * if a key is present in only one dict, the resulting dict get
pruned of the key (unexpressed path are set to the zero of
multiplication)

(it is achived with a silly overloading of + - / * of default dict no
magic is made here).

> * Why? I’m sure I could come up with a well-defined but absurd (and useless)
> "group", but why is yours interesting?
>
* it is fun, (not an argument, I do agree)

* ruby and Perl dont have it :) and I am close to come up with a
jqueryish like grammar of manipulation on tree made of imbricated
dict, (well real tree implemented with a parent property for each node
and attributes and value might be better suited)

* it gives results in the key/value database context and jsonish stuff
à la mongodb
   if you consider what map reduce in key/value database is. It boils
down to retrieving a dict of dict emitting a document (dict of dict)
by the mean of  a projection/or a matrix, and aggregating results
(addition for instance is vastly used) in a reduce operation.
For this it may be quite usefull, it factorizes code in mattrix that
can be stored, combined, added ...

  - in web/text indexing you can split a text to a serie of invariant
form of words with their associated frequency and measure how close
they are from a keyword by using either jaccard or cosinus similarity.
You can already do it, out of the box. I am now fighting my way to see
if I can easily build correlation matrices from two dicts.

   - matrix being vector, and matrix changing trees into trees you can
set a matrix as a matrix value  making a transformation in a subtree,
this is equivalent and  easier as composing functions.

   - if you have database of graphs that have no loops and a root, you
can query for similar path, of find paths in the path (as long as they
can be expressed in the form of dict of dict ..)
(I had too much time I coded a method to do it )

* it has applications  :
   - with the find method + projection + match_subtree , we have
static code analysis.

Imagine an AST. If I don't mistakes it can fit in  a tree (as a dict of dict).
you can find any exactly matching tree (for instance a bad design
pattern)  or close enough tree (using jaccard or cos) and says there
might be a problem in the resulting path of the code.
You can also transform AST in AST thus doing funny things such as on
the fly transorfmation of AST.

* would you like a rotation matrix  for dict( x = , y = , z = )  ? or
a polar transformation ?
* would you like a map/reduce of a tree of products that gives the
total,  the average, and deviance in one pass ?

I can see many applications in transforming trees to trees. I can see
some quirks though (since dict are unordered, in a matrix if a
destination is included in an already existing destination, it should
be forbidden since we cannot ensure the order of the operations, and
this dooms the concept of matrix in matrix).

I may lack a little precision in the wording still.

A langage is as strong as its base types. Giving testoterone to a type
(or creating a new powerfull one) is de facto strengthening a langage.
;)

Regards,
-- 
Julien