[Tutor] perl to python?
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Mon Dec 2 22:24:02 2002
[Note: the post I'm writing below is more about Perl than Python, and more
about understanding Perl's parse trees than about writing a Python
converter, so it's a bit off-topic.]
On Mon, 2 Dec 2002, Lance wrote:
> Is there a Perl to Python conversion program?
Unfortunately, no, not yet.
It should be technially possible to write a program to do this, but the
code might end up looking even more miserable than the original Perl code.
Still, I wonder how hard it would be to cook a toy example up.
What makes such an automatic converter hard is that Perl's grammar doesn't
appear to be really documented anywhere except in the Perl source code.
We'd need a tool that generates a "parse tree" of Perl code; once we had a
parse tree, we might have a good chance of writing a PerlToPython
converter.
In fact, it has been said that "Only Perl can parse Perl":
http://www.everything2.com/index.pl?node=only%20Perl%20can%20parse%20Perl
So such a converter would probably have to use Perl itself to generate the
parse tree. Perl does provide a 'backend' module called B for this.
http://www.perlpod.com/stable/perlcompile.html
What does a "parse tree" look like? It's a low-level representation of
the language. Here's an example of a such a "parse" of a simple
'hello.pl' program:
######
[dyoo@tesuque dyoo]$ cat hello.pl
print "Hello world\n";
###
It's a simple little program. Here's its "parse tree":
###
[dyoo@tesuque dyoo]$ perl -MO=Terse hello.pl
LISTOP (0x81668b0) leave [1]
OP (0x81668d8) enter
COP (0x80fe8d8) nextstate
LISTOP (0x8166868) print
OP (0x8166890) pushmark
SVOP (0x817bec8) const PV (0x80f6d88) "Hello world\n"
test.pl syntax OK
######
The capitalized letters on the left hand side are "opcodes" --- operation
codes. If we visit this "tree" in a preorder traversal, we'll see that:
1. Perl calls "enter", whatever that means. I think it means that it
will enter the program start.
2. It generates a nextstate, whatever that means.
3. It does a "pushmark" operation, whatever that is.
4. It puts the argument "hello world" on its stack. The 'SV' in SVOP
stands for "scalar variable".
5. It calls the 'print' list operator.
6. Finally, it exits, with a return value of 1, I think.
Here's a small Python program that's specifically designed to visit this
particular tree. I know that it's incorrect and incomplete (I don't even
understand the opcodes yet! *grin*), but it should give the flavor of
what effort a PerlToPython converter might involve:
###
"""A small program to demonstrate what might be involved in parsing
Perl into Python.
Danny Yoo (dyoo@hkn.eecs.berkeley.edu)
"""
parse_tree = ("LISTOP", "leave", 1,
[("OP", "enter",
[("COP", "nextstate",
[("LISTOP", "print",
[("OP", "pushmark", []),
("SVOP", "const PV", "Hello world\n", [])])])])])
## Some utility functions that we might need...
def opcode(instruction):
return instruction[0]
def children(instruction):
return instruction[-1]
def operands(instruction):
return instruction[1:-1]
class PerlToPython:
def __init__(self):
self.stack = []
self.lines = []
def visit(self, instruction):
op = opcode(instruction)
dispatch_function = getattr(self, "visit_" + op)
dispatch_function(instruction)
def visit_LISTOP(self, instruction):
for child in children(instruction):
self.visit(child)
args = operands(instruction)
if args[0] == 'print':
self.lines.append("print " + ','.join(self.stack))
elif args[0] == 'leave':
self.lines.append("raise SystemExit")
def visit_OP(self, instruction):
for child in children(instruction):
self.visit(child)
return ## fixme!
def visit_COP(self, instruction):
for child in children(instruction):
self.visit(child)
return ## fixme!
def visit_SVOP(self, instruction):
type, value = operands(instruction)
if type == "const PV":
self.stack.append("%s" % repr(value))
if __name__ == '__main__':
converter = PerlToPython()
converter.visit(parse_tree)
print '\n'.join(converter.lines)
###
Here's an example of this in action:
###
[dyoo@tesuque dyoo]$ python perl_into_python.py
print 'Hello world\n'
raise SystemExit
###
Let's look at another Perl parse tree of a slightly more complicated
program:
###
[dyoo@tesuque dyoo]$ cat loops.pl
for ($i = 0; $i < 10; $i++) {
print "$i\n";
}
[dyoo@tesuque dyoo]$ perl -MO=Terse loops.pl
LISTOP (0x80fa8c8) leave [1]
OP (0x80fa928) enter
COP (0x80fa8f0) nextstate
BINOP (0x8166900) sassign
SVOP (0x81668e0) const IV (0x80f6d88) 0
UNOP (0x81668c0) null [15]
SVOP (0x817bec8) gvsv GV (0x81025f0) *i
BINOP (0x80fa8a0) leaveloop
LOOP (0x80fa870) enterloop
UNOP (0x8104820) null
LOGOP (0x81047f8) and
BINOP (0x8166988) lt
UNOP (0x8166948) null [15]
SVOP (0x8166928) gvsv GV (0x81025f0) *i
SVOP (0x8166968) const IV (0x8102590) 10
LISTOP (0x8104798) lineseq
LISTOP (0x8104750) scope
OP (0x8104718) null [174]
LISTOP (0x8103fb8) print
OP (0x8103fe0) pushmark
UNOP (0x8103f90) null [67]
OP (0x8184868) null [3]
BINOP (0x8184840) concat [2]
UNOP (0x8184720) null [15]
SVOP (0x8184700) gvsv GV
(0x81025f0) *i
SVOP (0x8184740) const PV (0x8102614)
"\n"
UNOP (0x8184820) preinc [1]
UNOP (0x8104928) null [15]
SVOP (0x8104908) gvsv GV (0x81025f0) *i
OP (0x8104778) unstack
COP (0x81047c0) nextstate
loops.pl syntax OK
###
The complexity of this is a little bit deeper, but the idea is the same:
we have to handle some of these new opcodes, like LOOP, and transform them
into their Python equivalents. We may need to keep additional track of
things like scope and nesting. Not a particularly "hard" task, but it
might be a little complicated.
Alternatively, it might also be possible to translate a Perl parse tree
into a Python parse tree with the help of the 'compiler' module:
http://www.python.org/doc/lib/compiler.html
and then use a program called 'decompyle' to take that Python parse tree
and reproduce human-readable text:
http://www.crazy-compilers.com/decompyle/
But somehow, I think this might take longer to write than I anticipated.
Still, it is very possible to write this program. It might make a nice
winter project. *grin*
Good luck to you!