Hello,
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
Since, linear algebrae on dict was coldly welcomed, I waited to have some code to back me up to push my reasoning furhter, and it happily worked the way books predicted.
This was the reasoning :
** see here for a coded explanation http://readthedocs.org/docs/vectordict/en/latest/intro.html#homeomorphism-be...
for a sample of API, code, and result see here. http://readthedocs.org/docs/vectordict/en/latest/matrix.html#api
dict of dict might not be the best way to make trees but, having matrix operations on dict is being able to transform trees in trees natively.
The module is still quite a proof of concept, and it is not the implementation I advocate, but rather the idea. Because : isn't transforming trees into trees quite a recurrent task in modern Computer Science with key value database ?
Plus matrix * tree being side effect free, it is a good candidate for a canonical way to tranform tree in a parallelisable way.
And by the way I implemented matrix as vectordict so ... we have matrix operations on matrix. ^_^ (Brace thourself, InceptionMatrix are coming)
For the «not faint of heart» that are able to read un Perlish un PEP8 code : http://pypi.python.org/pypi/VectorDict/0.3.0
my 2 euro cents (which of course worth more than 2 US cents <:o) ),
Cheers,
PS : I am not sure that using defaultdict as a backend was the best idea of the century, but keys appearing in a dict after an addition where not very much in my idea of how a normal python dict should behave. PPS : I will -if I still have time- code sets operations on dict, issubset, diff, union, intersection. These are quite easy, but so unfun. Since I am not very gifted at explaining, I prefer to code and show the result later.
-- Julien Tayon
Le 06/02/2012 21:01, julien tayon a écrit :
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
Hi,
I studied linear algebra and I think I understand it fairly well. However, after reading your email and the linked documentation, I’m just confused. I really don’t know what this is about.
I think that you are defining something like a mathematical group[1] or ring[2], but:
[1] http://en.wikipedia.org/wiki/Group_%28mathematics%29 [2] http://en.wikipedia.org/wiki/Ring_%28mathematics%29
Regards,
-- Simon Sapin
2012/2/6 Simon Sapin simon.sapin@kozea.fr:
Le 06/02/2012 21:01, julien tayon a écrit :
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
Hi,
I studied linear algebra and I think I understand it fairly well. However, after reading your email and the linked documentation, I’m just confused. I really don’t know what this is about.
Because there is some dust under the carpet.
Lets define the notion of dict(dict()) (rooted k-ary trees) as a vector. Imagine : tree A { a: { b : 1, c : 2 }, e : 3.0 }
This is the same as vector B dict( tuple([ 'a', 'b' ]) = 1, tuple([ 'a', 'c' ]) = 2, tuple([ e ]) = 3.0 ) By collapsing the path of intricated dict to a single key (made of the ordered list of keys to the value) you always fall back on a dict of depth 1.
I can construct A from B and B from A without any loose of properties. Thus it is equivalent.
(Sparse) Matrix are therefore build this way dict( tuple( source_tuple, destination_tuple ) = function ) (I could not resolve myself to code stupid matrices and that have only magnitude instead of function, therefore my matrices are not the one of linear algebrae)
so any dict of dict is the same as a one depth dict. A path to a value defines a dimension, each paths are considered orthogonal.
So further reasoning will be made on vectors / 1 depth level dict.
There are two problems : 1) we can generate an infinite number of key so these are implicitly incomplete vectors on an infinite base it is as if when you define : dict( x = 1) you also mean dict( x = 1, y = null element for multiplication and neutral for addition , dn = null element for multiplication and neutral for addtion .... ) and I hear the tao of python saying explicit is better than implicit. But a dict is explicitly an incomplete vector on an infinite base.
2) The problem is in the algebrae of the values/leaves because the implementation is made by delegating addition all the way up to the value.
Normaly, we think of values of a vector as scalars/magnitude (float/int for instance), and addition should have no more consistency as the consistency of all the additions in a dict for all the values. Unless you need something else.
I intended to be a little more consistent and to enforce the fact that no values should have additions properties different than the properties of a ring.
However, I found no easy way to do this. I would need all the classes to tell wich algebrae they support by the mean of a property that would tell if Class A + Class B is commutative/associative/distributive. Still I can do interesting thing, so I don't really want to castrate the beast. (Plus, it would mean writing a PEP proposal, which is beyond my abilities)
I think that you are defining something like a mathematical group[1] or ring[2], but:
As far as I am concerned I am pretty confident in having all the properties of the ring with + & * on dict. And I think I do. Tell me if I need other tests here : https://github.com/jul/ADictAdd_iction/blob/master/vector_dict/ConsistentAlg... or if I misunderstood any properties. I may have a good intuition, I may recognize things when I see them, but I am clumsy with words.
>
Well, you achieve better algebraic properties if leaves belongs to a ring at least (float, numpy.array). But I have funny results with records too.
Since dimension can be considered independant as long as values have the same type all goes well as long as you do operations supported by the leaves. And if you know what you are doing when mixing up dimension, all goes well. for instance if with a matrix you multiply a nump.array with a weight (float) for instance, it has sense. if you multiply a string by a string, well, you go into trouble, and I cowardly let the excpetion raise.
For distance of vectors I also run in trouble when dealing with records like algebrae for + and * ( list and string ) I see an elegant workaround wich would be to know what is distance(record1 - record2) even though I ignore what record1 - record2 is. For instance with two string or ordered list, the distance can be defined as edition distance even though string1 - string2 is a nonsense. But I dont know yet how to make it fit in the puzzle. Because sometimes norm( A - B ) has more sense than A - B I may need to refactor the norm method. At this point I may also need cooperations from the other classes :)
I may admit, I have not thought of everything and there can be some holes in the racket, but, it is promising, and I had much fun using it so far.
Addition : given two vectors (as prior defined), we make the assumption that these are vectors on an infinit base (made of all the possible paths), and that when two vectors are added there are two possibles cases :
For multiplication I just accecpt either dict multiplication (non existent path default value being the null element) or scalar/magnitude multiplication
if you do dict( x = val, y = val2 ) 2 it will do dict( x = val 2, y = val2 * 2 ) (as with vectors)
(it is achived with a silly overloading of + - / * of default dict no magic is made here).
Why? I’m sure I could come up with a well-defined but absurd (and useless) "group", but why is yours interesting?
it is fun, (not an argument, I do agree)
it gives results in the key/value database context and jsonish stuff à la mongodb if you consider what map reduce in key/value database is. It boils down to retrieving a dict of dict emitting a document (dict of dict) by the mean of a projection/or a matrix, and aggregating results (addition for instance is vastly used) in a reduce operation. For this it may be quite usefull, it factorizes code in mattrix that can be stored, combined, added ...
in web/text indexing you can split a text to a serie of invariant form of words with their associated frequency and measure how close they are from a keyword by using either jaccard or cosinus similarity. You can already do it, out of the box. I am now fighting my way to see if I can easily build correlation matrices from two dicts.
matrix being vector, and matrix changing trees into trees you can set a matrix as a matrix value making a transformation in a subtree, this is equivalent and easier as composing functions.
if you have database of graphs that have no loops and a root, you can query for similar path, of find paths in the path (as long as they can be expressed in the form of dict of dict ..) (I had too much time I coded a method to do it )
it has applications :
Imagine an AST. If I don't mistakes it can fit in a tree (as a dict of dict). you can find any exactly matching tree (for instance a bad design pattern) or close enough tree (using jaccard or cos) and says there might be a problem in the resulting path of the code. You can also transform AST in AST thus doing funny things such as on the fly transorfmation of AST.
I can see many applications in transforming trees to trees. I can see some quirks though (since dict are unordered, in a matrix if a destination is included in an already existing destination, it should be forbidden since we cannot ensure the order of the operations, and this dooms the concept of matrix in matrix).
I may lack a little precision in the wording still.
A langage is as strong as its base types. Giving testoterone to a type (or creating a new powerfull one) is de facto strengthening a langage. ;)
Julien
On 2/6/2012 8:00 PM, julien tayon wrote:
Lets define the notion of dict(dict()) (rooted k-ary trees) as a vector. Imagine : tree A { a: { b : 1, c : 2 }, e : 3.0 }
This is the same as vector B dict( tuple([ 'a', 'b' ]) = 1, tuple([ 'a', 'c' ]) = 2, tuple([ e ]) = 3.0 )
Adding quotes makes it different from tree A. The difference is important because a dict simulates a (sparse) array only if the keys are ordered (or are ordered sequences of ordered objects). Strings are ordered, but not general objects.
There is no need to write tuples as tuple(somelist) dict( ('a', 'b') = 1 ('a', 'c') = 2 ('e',) = 3.0 )
Since your definitions of + and on dicts does not use order, using the terms 'vector' and 'matrix' just seem distracting to me. The only thing you are extracting from them is the idea of component-wise operations on collections. What is important is whether the operations apply to the values*.
Whenever one has a dict whose values are lists, it is common to start with empty lists and add items to the list for each key with d[key].append(val) You could imagine this operation as performing your dict addition d + {key:[val]} in place and then performing standard list addition in place (.extend). But thinking this way has limited use.
In actual application, the code is likely to be something like: for key,val in source: d.get(key,[]).append(val)
There are three points here.
The same points apply to lists. list1 + list2 is rare compared to list1.append(item) and list1.extend(iterable_of_items). And of course, both apply to all lists and objects and iterables, rather than specialized subcategories.
-- Terry Jan Reedy
2012/2/7 Terry Reedy tjreedy@udel.edu: >
Since your definitions of + and on dicts does not use order, using the terms 'vector' and 'matrix' just seem distracting to me. The only thing you are extracting from them is the idea of component-wise operations on collections. What is important is whether the operations apply to thevalues*.
I have checked I can go back and forth without problem. but oky I forgot the quote in A. I know it may have been disturbing, and I apologize.
Well, order in actual notation is a commodity for not repeating the dimension's name. it's easier to write [ 1 , 2, 3 ] then to always repeat [ x = 1, y = 2, z = 3 ] I am just going back to the basis.
Our disagreement mainly comes from me forgetting the quote in A.
>
In actual application, the code is likely to be something like: for key,val in source: d.get(key,[]).append(val)
It does not propagate recursively though. so adding d1 = { 'a' , { 'b' , { 'c' : 1 } , 'd' : 2 } with d2 = { 'a' , { 'b' , { 'c' : 2 } , 'd' : 1 } wont work with yout example, but will work with my definition of d1 + d2.
There are three points here.
These patterns are specific to subcategories of dicts. It has sense too for non scalar value, it was just already tough trying to explain with scalars, so I limited myself to the simple case.
For dicts (and sets and lists), in-place modification is more common than creating new dicts (or sets or lists). Python is not mathematics, and it is not a functional, side-effect-free language.
I just have to switch a Flag in the matrix operator to make it operate in place. I was not sure wich option was best.
well, you are definitely right on this point.
>
The same points apply to lists. list1 + list2 is rare compared to list1.append(item) and list1.extend(iterable_of_items). And of course, both apply to all lists and objects and iterables, rather than specialized subcategories.
I also provide an iterator in the form [ ( (path_tovalue), value ), ... ] ^_^
since it does recursive calls, I don't like it, and I try not to make it too obvious.
Cheers, Julien
On Mon, Feb 06, 2012 at 09:01:29PM +0100, julien tayon wrote:
Hello,
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
This seems interesting to me, but I don't see that they are important enough to be built-in to dicts.
At most, this could be a module in the standard library, but before that happens, you would have to prove the usefulness of the module. I suggest polishing it to a fit state to use in production, including tests, and putting it on PyPI. Once you can demonstrate some interest for it, then you can propose it gets added to the std lib.
Otherwise, this looks rather like a library of functions looking for a use. It might help if you demonstrate what concrete problems this helps you solve.
-- Steven
2012/2/7 Steven D'Aprano steve@pearwood.info:
This seems interesting to me, but I don't see that they are important enough to be built-in to dicts.
At most, this could be a module in the standard library, but before that happens, you would have to prove the usefulness of the module. I suggest polishing it to a fit state to use in production, including tests, and putting it on PyPI. Once you can demonstrate some interest for it, then you can propose it gets added to the std lib.
Of course, it's already on pypi, the unittest are being buit up, I just coded way too much stuff, so code coverage is slowly increasing. Since it's 90% syntaxic sugar, it is just a commodity for syntax of tree manipulation. I can improve the readability though.
But,
Otherwise, this looks rather like a library of functions looking for a use. It might help if you demonstrate what concrete problems this helps you solve.
Since 95% of the functions are method of a dict, I guess, we may call it an object.
Cheers,
On Feb 6, 2012, at 12:01 PM, julien tayon wrote:
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
And if you add tensor operations, the implementation can remain independent of the system of reference :-)
Contravariantly yours,
Raymond
On Feb 6, 2012, at 8:17 PM, Raymond Hettinger wrote:
On Feb 6, 2012, at 12:01 PM, julien tayon wrote:
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
And if you add tensor operations, the implementation can remain independent of the system of reference :-)
Contravariantly yours,
Raymond
+1
On 06.02.2012 21:01, julien tayon wrote:
Hello,
Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees.
Since, linear algebrae on dict was coldly welcomed, I waited to have some code to back me up to push my reasoning furhter, and it happily worked the way books predicted.
Why would you want to use a hash table (Python dict) for linear algebra? Not sure I can think of a worse datastructure for the purpose.
There are NumPy... And in the standard library there is an array module...
For matrix multiplication you can use DGEMM from any LAPACK library if you don't like NumPy (e.g. by means of ctypes).
What really should be discussed is inclusion of NumPy in the standard library (that is NumPy, not SciPy).
Sturla
Why would you want to use a hash table (Python dict) for linear algebra?
Because it naturally provides matrix. And matrix are an easy way to formalize and standardize tree manipulations which are a growing concern in real life computer craft.
Because actual CS is precise but not exact, and that metrics on objects enable more exact comparison :
== is the actual way to compare it is precise , but metrics (cos, norm, dot) enable is_close_to( Value , modulo error )
For instance no actual langage can tell if two floats are equal, because, there are error margins.
pi != 3.14159 pi is close to 3.1 [+-.05]
Exactitude and precision are not the same.
Not sure I can think of a worse datastructure for the purpose.
Well, it is the other way round. The least surprise principle is that + - * / behave the way it usually does 90% of the time
at my very personnal opinion, mathematical signs should have been reserved in all langages to operation analog to mathematics. And linear algebrae is one of the most accpeted behaviour for these symbols.
Since there are more than one way to add / mul / div /sub, at my very own opinion every and each class defining these signs should tell which arithmetics they supports, so that we can predict the behaviour of the composition of these operations and conflicts.
We have same symbols, with different meaning, it is a degenerescence that should disambiguized(? not sure of the orthograph). For instance in order to raise inconsistency exceptions.
>
There are NumPy... And in the standard library there is an array module...
Which at the opposite of list() supports + - * / in the algebraic sense.
>
For matrix multiplication you can use DGEMM from any LAPACK library if you don't like NumPy (e.g. by means of ctypes).
It is stupid to code matrix with an hash, I just say as there is a strong analogy between dict and vectors, as a result matrix that operates on dict exists and I can give them a meaning of transforming rooted trees in rooted trees.
What really should be discussed is inclusion of NumPy in the standard library (that is NumPy, not SciPy).
+1 for the inclusion of numpy in stdlib :) Even though I think it would need a little syntaxic sugar to make it more pythonic.
Cheers,
Jul
On Wed, Feb 8, 2012 at 10:12 PM, julien tayon julien@tayon.net wrote:
What really should be discussed is inclusion of NumPy in the standard library (that is NumPy, not SciPy).
+1 for the inclusion of numpy in stdlib :)
There's more to stdlib inclusion than "hey, wouldn't it be nice if <X> was part of the stdlib?". It needs to make sense to do so, usually by providing a tangible benefit to the overall Python ecosystem. For smaller projects (especially predominantly single person projects), stdlib adoption comes with a guarantee of some level of long term support (in particular, making sure the module continues to work with newer versions of Python and on newer operating system releases).
That isn't really the case with NumPy - it has a sizable developer base of its own, along with solid backing from Enthought. Incorporation into the standard library would be a lot of pain for minimal gain. If it helps, just consider SciPy Python's "stdlib++" if you're doing any kind of heavy number crunching with Python. There's a reason the PyPy folks were able to raise money to sponsor their NumPyPy compatibility effort - it's because the SciPy ecosystem is centred around NumPy, and NumPyPy promises to let developers enjoy the benefit's of PyPy without losing access to SciPy.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 08.02.2012 13:36, Nick Coghlan wrote:
That isn't really the case with NumPy - it has a sizable developer base of its own, along with solid backing from Enthought.
I thing you got this the wrong way. Inclusion in the stdlib requires long-term support, it is not a way to ensure long-term support for projects that don't have it. (And NumPy is likely to be supported for a very long time.)
Incorporation into the standard library would be a lot of pain for minimal gain. If it helps, just consider SciPy Python's "stdlib++" if you're doing any kind of heavy number crunching with Python. There's a reason the PyPy folks were able to raise money to sponsor their NumPyPy compatibility effort - it's because the SciPy ecosystem is centred around NumPy, and NumPyPy promises to let developers enjoy the benefit's of PyPy without losing access to SciPy.
NumPy is not just for number-crunching. It is also a general memory abstraction, a mutable container for any kind of binary data, for any kind of bit and byte fiddling, reading and parsing binary data, memory mapping binary files, etc. Another use-cases are computer graphics and image processing.
Sturla
On 08.02.2012 13:12, julien tayon wrote:
No, it naturally provides a hash-table, which is a simple in-memory database, not a matrix.
at my very personnal opinion, mathematical signs should have been reserved in all langages to operation analog to mathematics. And linear algebrae is one of the most accpeted behaviour for these symbols.
There is a world beyond linear algebra. Sometimes we need to do things that cannot easily be fit into the semantics of matrix operations. And for those that only can think in terms of matrices there are languages called Matlab, Scilab, and Octave.
It is stupid to code matrix with an hash, I just say as there is a strong analogy between dict and vectors,
No there is not. A vector is ordered, a hash-table (dict) is unordered.
In a vectorlike structure, e.g. a Python list, element i+1 is stored subsequently to element i.
In a hash-table, e.g. a Python dict, element hash(i+1) is not stored subsequently to element hash(i).
Sturla
Sturla Molden wrote:
On 08.02.2012 13:12, julien tayon wrote:
It is stupid to code matrix with an hash, I just say as there is a strong analogy between dict and vectors,
No there is not. A vector is ordered, a hash-table (dict) is unordered.
Not necessarily. There is nothing in the API for Python lists that requires that elements are stored in one continuous array. That's a side-effect of the implementation.
You are focusing too much on accidental implementation details and not enough on the fundamental concept of "vector" or "hash table".
Fundamentally, a "dict" is a data structure that associates arbitrary keys to values, such that each key is unique but values may not be. Note that the use of a hash table for dicts (mappings) is just one possible implementation.
Fundamentally a list is a mapping from sequential (and therefore unique) integer keys (the indexes) to values. Note that a linear array with the key (index) being implicit rather than explicit is just one possible implementation.
I have no opinion on whether Julien's proposal is useful or not, but vectors (lists) can be implemented using mappings (dicts). Lua is proof of this: their table type operates as both list and dict.
http://lua-users.org/wiki/TablesTutorial
-- Steven
On 2/8/2012 7:51 PM, Steven D'Aprano wrote:
Not necessarily. There is nothing in the API for Python lists that requires that elements are stored in one continuous array. That's a side-effect of the implementation.
I believe NumPy uses multiple blocks, as does deque.
-- Terry Jan Reedy
On 2/9/12 2:44 AM, Terry Reedy wrote:
On 2/8/2012 7:51 PM, Steven D'Aprano wrote:
Not necessarily. There is nothing in the API for Python lists that requires that elements are stored in one continuous array. That's a side-effect of the implementation.
I believe NumPy uses multiple blocks, as does deque.
numpy uses uniformly strided memory starting from a single memory location, which is not quite the same as using multiple blocks.
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 09.02.2012 03:44, Terry Reedy wrote:
I believe NumPy uses multiple blocks, as does deque.
No it does not (see Robert Kern's reply).
But a lot of numerical codes in C or Java do, using an array of pointers (C) or an array of arrays (Java) to emulate a two dimensional array, particularly numerical code written by amateurs. I've also seen this in Python, using (heaven forbid) lists of lists as a 2D array replacement. It is sad that Numerical Receipes encourages this coding style. (Actually the third edition does not, but it is not sufficient to remedy the damage.) Those who don't understand why "jagged arrays" can be a problem should stick to Matlab, Fortran or NumPy.
Sturla