[docs] [issue11186] pydoc: HTMLDoc.index() doesn't support PEP 383

Fri Feb 11 13:55:09 CET 2011

New submission from STINNER Victor <victor.stinner at haypocalc.com>:

If you have an undecodable filenames on UNIX, Python 3 escapes undecodable bytes using surrogates. pydoc: HTMLDoc.index() uses indirectly os.listdir() which does such operation, and later filenames are encoded to UTF-8 (the whole HTML content is encoded to UTF-8).

In practice, you cannot import such .py file, you run them using "python script.py", so we can maybe just ignore modules with undecodable filenames. For example:

def isUndecodableFilename(filename):
  return any((0xD800 <= ord(ch) <= 0xDFFF) for ch in filename)

Or we can escape the surrogate characters, but I don't know how. Write "\uDC80" in a HTML document is not a good idea, especially in an URL (e.g. Firefox replaces \ by / in URLs).

----------
assignee: docs at python
components: Documentation, Library (Lib)
messages: 128382
nosy: docs at python, haypo
priority: normal
severity: normal
status: open
title: pydoc: HTMLDoc.index() doesn't support PEP 383
versions: Python 3.1, Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11186>
_______________________________________