a 100-line indentation-based preprocessor for HTML

Steve Howell showell30 at yahoo.com
Sat Nov 28 04:04:31 CET 2009


Python has this really neat idea called indentation-based syntax, and
there are folks that have caught on to this idea in the HTML
community.

AFAIK the most popular indentation-based solution for generating HTML
is a tool called HAML, which actually is written in Ruby.

I have been poking around with the HAML concepts in Python, with the
specific goal of integrating with Django.   But before releasing that,
I thought it would be useful to post code that distills the basic
concept with no assumptions about your target renderer.  I hope it
also serves as a good example of what you can do in exactly 100 lines
of Python code.

Here is what it does...

    You can use indentation syntax for HTML tags like table.

    From this...

    table
        tr
            td
                Left
            td
                Center
            td
                Right

    ...you get this:

    <table>
        <tr>
            <td>
                Left
            </td>
            <td>
                Center
            </td>
            <td>
                Right
            </td>
        </tr>
    </table>

    Lists and divs work the same way, and note that attributes are not
a problem.

    From this...

    div class="spinnable"
        ul
            li id="item1"
               One
            li id="item2"
               Two

    ...you get this:

    <div class="spinnable">
        <ul>
            <li id="item1">
               One
            </li>
            <li id="item2">
               Two
            </li>
        </ul>
    </div>

    You can still use raw HTML tags where appropriate (such as when
converting
    legacy markup to the new style).

    From this...

    <table>
        tr
            td
                <b>Hello World!</b>
    </table>

    ...you get this:

    <table>
        <tr>
            <td>
                <b>Hello World!</b>
            </td>
        </tr>
    </table>

And here is the code:

    import re

    def convert_text(in_body):
        '''
        Convert HAML-like markup to HTML.  Allow raw HTML to
        fall through.
        '''
        indenter = Indenter()
        for prefix, line, kind in get_lines(in_body):
            if kind == 'branch' and '<' not in line:
                html_block_tag(prefix, line, indenter)
            else:
                indenter.add(prefix, line)
        return indenter.body()


    def html_block_tag(prefix, line, indenter):
        '''
        Block tags have syntax like this and only
        apply to branches in indentation:

        table
            tr
                td class="foo"
                    leaf #1
                td
                    leaf #2
        '''
        start_tag = '<%s>' % line
        end_tag = '</%s>' % line.split()[0]
        indenter.push(prefix, start_tag, end_tag)


    class Indenter:
        '''
        Example usage:

        indenter = Indenter()
        indenter.push('', 'Start', 'End')
        indenter.push('    ', 'Foo', '/Foo')
        indenter.add ('        ', 'bar')
        indenter.add ('    ', 'yo')
        print indenter.body()
        '''
        def __init__(self):
            self.stack = []
            self.lines = []

        def push(self, prefix, start, end):
            self.add(prefix, start)
            self.stack.append((prefix, end))

        def add(self, prefix, line):
            if line:
                self.pop(prefix)
            self.insert(prefix, line)

        def insert(self, prefix, line):
            self.lines.append(prefix+line)

        def pop(self, prefix):
            while self.stack:
                start_prefix, end =  self.stack[-1]
                if len(prefix) <= len(start_prefix):
                    whitespace_lines = []
                    while self.lines and self.lines[-1] == '':
                        whitespace_lines.append(self.lines.pop())
                    self.insert(start_prefix, end)
                    self.lines += whitespace_lines
                    self.stack.pop()
                else:
                    return

        def body(self):
            self.pop('')
            return '\n'.join(self.lines)

    def get_lines(in_body):
        '''
        Splits out lines from a file and identifies whether lines
        are branches, leafs, or blanks.  The detection of branches
        could probably be done in a more elegant way than patching
        the last non-blank line, but it works.
        '''
        lines = []
        last_line = -1
        for line in in_body.split('\n'):
            m = re.match('(\s*)(.*)', line)
            prefix, line = m.groups()
            if line:
                line = line.rstrip()
                if last_line >= 0:
                    old_prefix, old_line, ignore = lines[last_line]
                    if len(old_prefix) < len(prefix):
                        lines[last_line] = (old_prefix, old_line,
'branch')
                last_line = len(lines)
                lines.append((prefix, line, 'leaf')) # leaf for now
            else:
                lines.append(('', '', 'blank'))
        return lines

As I mention in the comment for get_lines(), I wonder if there are
more elegant ways to deal with the indentation, both of the input and
the output.




More information about the Python-list mailing list