OT: Ultimate Language Syntax Cleanness Comparison
pyth at devel.trillke.net
Sat Feb 8 18:29:47 CET 2003
Jim Richardson wrote:
> On 7 Feb 2003 19:55:35 -0800,
> Jeremy Fincher <tweedgeezer at hotmail.com> wrote:
> > holger krekel <pyth at devel.trillke.net> wrote in message news:<mailman.1044658940.11235.python-list at python.org>...
> >> I was actually quite surprised to find out (together with a perl-friend)
> >> that there is no easy way to parse perl. All the methods involve
> >> evaluating/executing it at the same time. Cool, isn't it.
> > That's not true. Perl *is* compiled to a bytecode format (what do you
> > think all the jazz about Parrot is for?) There's no easy way to lex
> > Perl separate from parsing it. Lexing and parsing Perl code is one in
> > the same. Evaluating it is entirely separate.
> > Jeremy
> *raises hand in ignorance*
> Can someone explain the differences between them? that is, evaluating,
> parsing and lexing? they seem pretty synonymic (ick!) to me, so I am
> missing something. What?
Sure. It goes like this with python (and often with other languages)
lexing/tokenizing -> parsing -> compiling -> evaluating/executing
Let's go through it step by step. Let's assume we have the string:
first we lex it into 'tokens':
1,0-1,4: INDENT ' '
1,4-1,6: NAME 'if'
1,7-1,8: NUMBER '1'
1,8-1,9: OP ':'
1,9-1,10: NEWLINE '\n'
1,22-1,27: NAME 'print'
1,28-1,35: STRING '"hello"'
1,35-1,36: NEWLINE '\n'
2,0-2,0: DEDENT ''
2,0-2,0: ENDMARKER ''
So now you have the 'tokens' and the parser e.g. doesn't have to
care about whitespace among other things. Note that the
tokenizer spills out 'INDENT' and 'DEINDENT' tokens because
they are significant in Python. But the details are abstracted
Next the parser:
>>> import compiler
None)]))], None)])) # slightly reformated
this already reflects the *Syntax* of our little if-statement.
The compiler takes this syntax tree and compiles it into bytecode:
>>> compiler.compile(s, '', 'exec').
<code object <module> at 0x827efb0, file "", line 1>
>>> compiler.compile(s, '', 'exec').co_code
And here you see the actual bytecode that is executed by
the Python-Interpreter (aka VM). You can get a human readable
>>> dis.dis(compiler.compile(s, '', 'exec'))
0 SET_LINENO 0
3 SET_LINENO 1
6 LOAD_CONST 1 (1)
9 JUMP_IF_FALSE 12 (to 24)
13 SET_LINENO 2
16 LOAD_CONST 2 ('hello')
21 JUMP_FORWARD 1 (to 25)
>> 24 POP_TOP
>> 25 LOAD_CONST 0 (None)
Does that clarify a bit what the steps
tokenizing -> parsing -> compiling -> executing
More information about the Python-list