[issue9819] TESTFN_UNICODE and TESTFN_UNDECODABLE
STINNER Victor
report at bugs.python.org
Fri Sep 10 12:27:56 CEST 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> WARNING: The filename '@test_464_tmp-共有される' CAN be encoded
> by (...) cp932
We should find character not encodable in any Windows code page, but accepted as filenames.
> characters like "\u2661" or "\u2668" (...)
mbcs uses "ANSI" code pages: cp1250..cp1258 and cp874 (and maybe others because you wrote that your setup uses cp932):
http://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages
I wrote a short script to find a unencodable filename (attached to this issue). Output:
u'\u0301' is encodable to cp1258
u'\u0363' is not encodable to any code page
u'\u2661' is encodable to cp949
u'\u5171' is encodable to cp932, cp936, cp949, cp950
(CODE_PAGES constant of the script might be incomplete)
u'\u2661' is not a good candidate. u'\u0363' looks better. Be we can mix different characters to limit the probability that the whole string is encodable. Example:
u'\u2661\u5171' is encodable to cp949
u'\u0301\u0363\u2661\u5171' is not encodable to any code page
> TESTFN_UNICODE_UNDECODEABLE (2.x)
This is a typo fixed by r83987 in py3k.
----------
Added file: http://bugs.python.org/file18823/find_unencode_filename.py
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9819>
_______________________________________
More information about the Python-bugs-list
mailing list