[Tutor] How does # -*- coding: utf-8 -*- work?
eryksun
eryksun at gmail.com
Sun Jan 27 00:50:39 CET 2013
On Sat, Jan 26, 2013 at 11:38 AM, Santosh Kumar <sntshkmr60 at gmail.com> wrote:
>
> Everything starting with hash character in Python is comment and is
> not interpreted by the interpreter. So how does that works? Give me
> full explanation.
The encoding declaration is parsed in the process of compiling the
source. CPython uses the function get_coding_spec in tokenizer.c.
CPython 2.7.3 source link:
http://hg.python.org/cpython/file/70274d53c1dd/Parser/tokenizer.c#l205
You can use the parser module to represent the nodes of a parsed
source tree as a sequence of nested tuples. The first item in each
tuple is the node type number. The associated names for each number
are split across two dictionaries. symbol.sym_name maps non-terminal
node types, and token.tok_name maps terminal nodes (i.e. leaf nodes in
the tree). In CPython 2.7/3.3, node types below 256 are terminal.
Here's an example source tree for two types of encoding declaration:
>>> src1 = '# -*- coding: utf-8 -*-'
>>> parser.suite(src1).totuple()
(339, (257, (0, '')), 'utf-8')
>>> src2 = '# coding=utf-8'
>>> parser.suite(src2).totuple()
(339, (257, (0, '')), 'utf-8')
As expected, src1 and src2 are equivalent. Now find the names of node
types 339, 257, and 0:
>>> symbol.sym_name[339]
'encoding_decl'
>>> symbol.sym_name[257]
'file_input'
>>> token.ISTERMINAL(0)
True
>>> token.tok_name[0]
'ENDMARKER'
The base node is type 339 (encoding_decl). The child is type 257
(file_input), which is just the empty body of the source (to keep it
simple, src1 and src2 lack statements). Tacked on at the end is the
string value of the encoding_decl (e.g. 'utf-8').
More information about the Tutor
mailing list