code indentation
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Jul 23 23:20:52 EDT 2007
En Mon, 23 Jul 2007 16:53:01 -0300, ...:::JA:::...
<vedrandekovic at v-programs.com> escribió:
>> If you are using the tokenize module as suggested some time ago, try to
>> analyze the token sequence you get using { } (or perhaps begin/end pairs
>> in your own language, that are easier to distinguish from a dictionary
>> display) and the sequence you get from the "real" python code. Then
>> write
>> a script to transform one into another:
>
>> from tokenize import generate_tokens
>> from token import tok_name
> >from cStringIO import StringIO
>
>> def analyze(source):
> > g = generate_tokens(StringIO(source).readline)
> > for toknum, tokval, _, _, _ in g:
> > print tok_name[toknum], repr(tokval)
>
>> I think you basically will have to ignore INDENT, DEDENT, and replace
>> NAME+"begin" with INDENT, NAME+"end" with DEDENT.
>
> So......how can I do this?????????????
> I will appreciate any help!!!!!
Try with a simple example. Let's say you want to convert this:
for x in range(10):
begin
print x
end
into this:
for x in range(10):
print x
Using the analyze() function above, the former block (pseudo-python) gives
this sequence of tokens:
NAME 'for'
NAME 'x'
NAME 'in'
NAME 'range'
OP '('
NUMBER '10'
OP ')'
OP ':'
NEWLINE '\n'
NAME 'begin'
NEWLINE '\n'
NAME 'print'
NAME 'x'
NEWLINE '\n'
NAME 'end'
ENDMARKER ''
The latter block ("real" python) gives this sequence:
NAME 'for'
NAME 'x'
NAME 'in'
NAME 'range'
OP '('
NUMBER '10'
OP ')'
OP ':'
NEWLINE '\n'
INDENT ' '
NAME 'print'
NAME 'x'
DEDENT ''
ENDMARKER ''
If you feed this token sequence into untokenize, in response you get a
source code equivalent to the "real" python example above. So, to convert
your "pseudo" python into the "real" python, it's enough to convert the
first token sequence into the second - and from that, you can reconstruct
the "real" python code. Converting from one sequence into the other is a
programming exercise and has nothing to do with the details of the
tokenize module, nor is very Python-specific - looking at both sequences
you should figure out how to convert one into the other. (Hint: a few
additional newlines are not important)
It is even simpler than the example given in the tokenize documentation:
<http://docs.python.org/lib/module-tokenize.html> - which transforms
3.1416 into Decimal("3.1416") by example.
Once you get this simple case working, you may try what happens with this:
for x in range(10):
begin
print x
end
and this:
for x in range(10): begin
print x
end
and later this:
for x in range(10):
begin
print x
end
You are now using explicit begin/end pairs to group statements, so
indentation is no more significant. You may want to preprocess the
pseudo-python source, stripping any leading blanks, before using tokenize
- else you'll get indentation errors (which are bogus in your
pseudo-python dialect).
Since this will be your own Python dialect, don't expect that someone else
will do the work for you - you'll have to do it yourself. But it's not too
dificult if you do the things in small steps. In case you get stuck at any
stage and have specific questions feel free to ask.
--
Gabriel Genellina
More information about the Python-list
mailing list