[Python-Dev] tokenize string literal problem
C or L Smith
smiles at worksmail.net
Sat Oct 24 08:04:25 CEST 2009
I'm trying to modify the doctest DocTestParser so it will parse docstring code snippets out of a *py file. (Although doctest can parse these with another method out of *pyc, it is missing certain decorated functions and we would also like to insist of import of needed modules rather and that method automatically loads everything from the module containing the code.)
I need to find code snippets which are located in docstrings. Docstrings, being string literals should be able to be parsed out with tokenize. But tokenize is giving the wrong results (or I am doing something wrong) for this (pathological) case:
A quoted triple quote is not a closing
of this docstring:
>>> print '"""'
""" # <-- this is the closing quote
Here is how I tokenize the file:
import re, tokenize
DOCSTRING_START_RE = re.compile('\s+[ru]*("""|' + "''')")
for ti in tokenize.generate_tokens(o.next):
typ = ti
text = ti[-1]
if typ == tokenize.STRING:
DOCSTRING: ' """\n A quoted triple quote is not a closing\n of this docstring:\n >>> print \'"""\'\n'
DOCSTRING: ' """\n """ # <-- this is the closing quote\n'
There should be only one string tokenized, I believe. The PythonWin editor parses (and colorizes) this correctly, but tokenize (or I) are making an error.
Thanks for any help,
More information about the Python-Dev