Weaver/Yarn Pattern in Python

Christian Stork python-list at cstork.org
Wed Jan 21 07:24:32 CET 2004


Hello everybody,

I am using Python to prototype a compression algorithm for tree-shaped
data, e.g., XML data or abstract syntax trees.  I use a variant of the
visitor design pattern--called weaver/yarn pattern--in order to
traverse the tree that is being compressed or decompressed.  I do not
know of any other interesting application of the weaver/yarn pattern,
but it seems like a very suitable subject of study for different
implementation strategies.  Essentially, I am asking for input on my
design decisions.

First, let me give you a simple example of the traditional visitor
pattern.  The following diagram illustrates a depth-first traversal of
a mini tree by a visitor.  The basic idea is that the tree accepts a
visitor and helps it to do a kind of type-dispatch based on the kind
of nodes.


Tree data structure (nodes a,b,c of types A,B,C)

      a:A      
      / \      
     /   \     
   b:B   c:C   

           Nested calls
           ------------
Call location          Call stack

main:                  a.accept(v)         
   A:                     v.visitA(a)      
      V:                     b.accept(v)   
         B:                     v.visitB(b)
      V:                     c.accept(v)   
         C:                     v.visitC(c)


class A(Node):
    def accept(self, visitor):
        visitor.visitA(self)
class B(Node):
    def accept(self, visitor):
        visitor.visitB(self)
class C(Node):
    def accept(self, visitor):
        visitor.visitC(self)


class Visitor:
    "Perform specific tasks while visiting certain nodes"
    def visitA(self, a):
        for k in a.kids:
            k.accept(self)
    def visitB(self, b):
        pass
    def visitC(self, c):
        pass

The advantage of this ping-pong between the tree and visitor v is that
v encapsulates related processing instructions.  Several different
visitors can be maintained independently of each other and without
forcing changes to the tree node classes.  The tree nodes only need to
provide a node.accept(visitor) method.  Type-checking can ensure
the match between the visitor and the tree data structure.

Normally, visitors encapsulate different processing passes, which are
run one after the other, each time traversing the whole tree.  I have
implemented the compression of trees as several (sub)visitors c1..cN
even though they could have been implemented as one big visitor.
Besides the easy recombinability of visitors this has the added
advantage that I can use the same visitors for compression and
decompression where this is appropriate.

But now I have a problem when decompressing.  In order to run one
visitor after another the first one expects to traverse the whole
tree.  But this is impossible in case of a decompressor.  It lies in
the very nature of the application that the tree is being constructed
while the visitors work on it.  Conceptually the solution is easy.
The decompression subvisitors d1..dM have to process the partially
available tree upto the point of traversal where it is available.  At
each node execution has to iterate over the applicable code of d1..dM
in the given order.  This necessitates a decomposition of visitors
into something that we call yarns and these yarns are weaved by one
visitor, which we call the weaver.  Thus the name "Weaver/Yarn
Pattern" for this variation of the visitor pattern.

The following exemplifies my current implementation of the w/y pattern
for a recursive descent (ie depth-first traversal) visitor.  For each
(sub)visitor d1..dM the former d<i>.visitX(x) method is divided into
several y.weaveX_...() methods.  At entry and exit the weaver invokes
y.weaveX_First() and y.weaveX_Last().  Each descent into a child is
surrounded by y.weaveX_Before(kidno) and y.weaveX_After(kidno) method
calls.

class Yarn:
    # methods replacing visitA(..)
    def weaveA_First(self, a):
        pass
    def weaveA_Before(self, a, kidno):
        pass
    def weaveA_After(self, a, kidno):
        pass
    def weaveA_Last(self, a):
        pass
    # methods replacing visitB(..)
    def weaveB_First(self, a):
        pass
    def weaveB_Last(self, a):
        pass
    # methods replacing visitC(..)
    ...


class Weaver:
    "A special visitor which weaves yarns"
    def __init__(self, yarns):
        self.yarns = yarns
    def visitA(self, a):
        for y in self.yarns:
            y.weaveA_First(a)
        for i, k in enumerate(a.kids):
            for y in self.yarns:
                y.weaveA_Before(a, i)
            k.accept(self)
            for y in self.yarns:
                y.weaveA_After(a, i)
        for y in self.yarns:
            y.weaveA_First(a)
    def visitB(self, b):
        for y in self.yarns:
            y.weaveA_First(b)
        for y in self.yarns:
            y.weaveA_First(b)
    def visitC(self, b):
        ...

By now it should be obvious that the boilerplate for this approach
becomes quite extensive and it would be desirable to reduce it.  To
mitigate the problem I did three things:

- Automatically generate templates for yarn classes.  The actual code
  can be filled in.  Disadvantage: No convenient way to deal with
  changes to the tree data structure.

- The node.accept(weaver) methods are changed to call a "generic"
  weaver.weave(node) method (instead of weaver.visitX(node)), which
  hackishly constructs all the calls to the yarns by assembling the
  method names from the __class__.__name__ and one of "First", "Last",
  "Before", and "After".

This solution does pretty much what I want:

- Writing selected yarn methods allows me to express with little
  overhead /what/ code to execute /when/ in the weaving process.

- Once the weaver/yarn interaction machinery is set up correctly, the
  debugger points me to real code in case of bugs.  This is an
  advantage over approaches that create classes at runtime, e.g., use
  of metaclasses or the "new" module.

OTOH, I'm not perfectly happy with this solution since it's totally
"hackish".  For example, I have to use a function, hand-written
specifially for this purpose, in order to "type-check" the
correspondence of the tree types and the yarns.

Maybe Python is not the right language for an elegant solution and I
should look at more rigorously or differently typed languages, but I
thought it's worth presenting my case here and asking what other
python hackers think.  Maybe there is another more pythonic way to do
this?

I looked into several of python's features, but to no avail.
Iterators don't seem to play well with yarns (especially if you try to
thread objects thru the yarns to support synthesized/inherited
attributes a known from attribute grammars).  I also looked at
metaclasses in order to reduce the boilerplate but that did pay off
either.

There are certainly other languages and language concepts, which offer
interesting implementation alternatives:

- Dylan's generic functions seems to be a a good match for my problem,
  which allows me to get around the method name mangling in
  weaver.weave(node).  Anybody an idea for a nice emulation of generic
  functions in python? :)

- Lazy evaluation as provided by Haskell (or even Ocaml) seems to make
  the traveral by several visitors unnecessary, but I'm not yet
  convinced enough of this approach to start a full reimplementation.

- Systems for attribute grammar evaluation seem to address some of my
  general concerns, but I am afraid that they might constrain me too
  much.

If you're still reading this I'm impressed ;-) and I would be very
happy to hear what you think.

-Chris

-- 
Chris Stork   <>  Support eff.org!  <>   http://www.ics.uci.edu/~cstork/
OpenPGP fingerprint:  B08B 602C C806 C492 D069  021E 41F3 8C8D 50F9 CA2F




More information about the Python-list mailing list