[issue9819] TESTFN_UNICODE and TESTFN_UNDECODABLE

STINNER Victor report at bugs.python.org
Fri Sep 10 12:27:56 CEST 2010


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

> WARNING: The filename '@test_464_tmp-共有される' CAN be encoded 
> by (...) cp932

We should find character not encodable in any Windows code page, but accepted as filenames.

> characters like "\u2661" or "\u2668" (...)

mbcs uses "ANSI" code pages: cp1250..cp1258 and cp874 (and maybe others because you wrote that your setup uses cp932):
http://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages

I wrote a short script to find a unencodable filename (attached to this issue). Output:

u'\u0301' is encodable to cp1258
u'\u0363' is not encodable to any code page
u'\u2661' is encodable to cp949
u'\u5171' is encodable to cp932, cp936, cp949, cp950

(CODE_PAGES constant of the script might be incomplete)

u'\u2661' is not a good candidate. u'\u0363' looks better. Be we can mix different characters to limit the probability that the whole string is encodable. Example:

u'\u2661\u5171' is encodable to cp949
u'\u0301\u0363\u2661\u5171' is not encodable to any code page

> TESTFN_UNICODE_UNDECODEABLE (2.x)

This is a typo fixed by r83987 in py3k.

----------
Added file: http://bugs.python.org/file18823/find_unencode_filename.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9819>
_______________________________________


More information about the Python-bugs-list mailing list