[New-bugs-announce] [issue9409] doctest in python2.7 can't handle non-ascii characters
Hugo Lopes Tavares
report at bugs.python.org
Thu Jul 29 03:24:07 CEST 2010
New submission from Hugo Lopes Tavares <hltbra at gmail.com>:
When trying to run my test suite I had a problem with python2.7. My suite ran 100% in Python2.4, Python2.5, Python2.6 and Python3.2a0, so I thought it would be a kind of doctest flaw.
Taking a look at the code, there is the following in doctest.py:1331:
source = example.source.encode('ascii', 'backslashreplace')
The problem is that my doctest file had non-ascii files and I got trouble.
hugo at hugo-laptop:~/issue$ python2.7 example.py
non-ascii.txt
Doctest: non-ascii.txt ... ok
ascii.txt
Doctest: ascii.txt ... ERROR
======================================================================
ERROR: ascii.txt
Doctest: ascii.txt
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/doctest.py", line 2148, in runTest
test, out=new.write, clear_globs=False)
File "/usr/local/lib/python2.7/doctest.py", line 1382, in run
return self.__run(test, compileflags, out)
File "/usr/local/lib/python2.7/doctest.py", line 1272, in __run
got += _exception_traceback(exc_info)
File "/usr/local/lib/python2.7/doctest.py", line 244, in _exception_traceback
traceback.print_exception(exc_type, exc_val, exc_tb, file=excout)
File "/usr/local/lib/python2.7/traceback.py", line 125, in print_exception
print_tb(tb, limit, file)
File "/usr/local/lib/python2.7/traceback.py", line 69, in print_tb
line = linecache.getline(filename, lineno, f.f_globals)
File "/usr/local/lib/python2.7/linecache.py", line 14, in getline
lines = getlines(filename, module_globals)
File "/usr/local/lib/python2.7/doctest.py", line 1331, in __patched_linecache_getlines
source = example.source.encode('ascii', 'backslashreplace')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
----------------------------------------------------------------------
Ran 2 tests in 0.006s
FAILED (errors=1)
hugo at hugo-laptop:~/issue$
Taking an inner look at doctest.py in python2.6 and python2.7 I realized there is another inconsistency with filenames in both (I was lucky to try at first a filename that doesn't match the regex):
__LINECACHE_FILENAME_RE = re.compile(r'<doctest '
r'(?P<name>[\w\.]+)'
r'\[(?P<examplenum>\d+)\]>$')
Well, <name> is the file name, but filenames are not only composed of alphanums and dots. Maybe it should be slightly different, like:
__LINECACHE_FILENAME_RE = re.compile(r'<doctest '
r'(?P<name>.+?)'
r'\[(?P<examplenum>\d+)\]>$', re.UNICODE)
Because we can have several kinds of names. But it is not the top of the iceberg, anyaway.
To solve my problem, I propose moving back that first snippet to how it was in python2.6. The diff would be:
--- /usr/local/lib/python2.7/doctest.py 2010-07-28 22:07:01.272234398 -0300
+++ doctest.py 2010-07-28 22:20:42.000000000 -0300
@@ -1328,8 +1328,7 @@
m = self.__LINECACHE_FILENAME_RE.match(filename)
if m and m.group('name') == self.test.name:
example = self.test.examples[int(m.group('examplenum'))]
- source = example.source.encode('ascii', 'backslashreplace')
- return source.splitlines(True)
+ return example.source.splitlines(True)
else:
return self.save_linecache_getlines(filename, module_globals)
----------
files: ascii.txt
messages: 111881
nosy: hugo
priority: normal
severity: normal
status: open
title: doctest in python2.7 can't handle non-ascii characters
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file18242/ascii.txt
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9409>
_______________________________________
More information about the New-bugs-announce
mailing list