[New-bugs-announce] [issue34347] AIX: test_utf8_mode.test_cmd_line fails
Michael Felt
report at bugs.python.org
Mon Aug 6 12:06:50 EDT 2018
New submission from Michael Felt <michael at felt.demon.nl>:
The test fails because
byte_str.decode('ascii', 'surragateescape')
is not what ascii(byte_str) - returns when called from the commandline.
Assumption: since " check('utf8', [arg_utf8])" succeeds I assume the parsing of the command-line is correct.
DETAILS
>>> arg = 'h\xe9\u20ac'.encode('utf-8')
>>> arg
b'h\xc3\xa9\xe2\x82\xac'
>>> arg.decode('ascii', 'surrogateescape')
'h\udcc3\udca9\udce2\udc82\udcac'
I am having a difficult time getting the syntax correct for all the "escapes", so I added a print statement in the check routine:
test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ...
code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac'
out:UTF-8:['h\xe9\u20ac']
code:import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:]))) arg:b'h\xc3\xa9\xe2\x82\xac'
out:ISO8859-1:['h\xc3\xa9\xe2\x82\xac']
test code with my debug statement (to generate above):
def test_cmd_line(self):
arg = 'h\xe9\u20ac'.encode('utf-8')
arg_utf8 = arg.decode('utf-8')
arg_ascii = arg.decode('ascii', 'surrogateescape')
code = 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.argv[1:])))'
def check(utf8_opt, expected, **kw):
out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
print("\ncode:%s arg:%s\nout:%s" % (code, arg, out))
args = out.partition(':')[2].rstrip()
self.assertEqual(args, ascii(expected), out)
check('utf8', [arg_utf8])
if sys.platform == 'darwin' or support.is_android:
c_arg = arg_utf8
else:
c_arg = arg_ascii
check('utf8=0', [c_arg], LC_ALL='C')
So the first check succeeds:
check('utf8', [arg_utf8])
But the second does not:
FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 225, in test_cmd_line
check('utf8=0', [c_arg], LC_ALL='C')
File "/data/prj/python/src/python3-3.7.0/Lib/test/test_utf8_mode.py", line 218, in check
self.assertEqual(args, ascii(expected), out)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
: ISO8859-1:['h\xc3\xa9\xe2\x82\xac']
I tried saying the "expected" is arg, but arg is still a byte object, the cmd_line result is not (printed as such).
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != "[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
? +
: ISO8859-1:['h\xc3\xa9\xe2\x82\xac']
----------
components: Interpreter Core, Tests
messages: 323214
nosy: Michael.Felt
priority: normal
severity: normal
status: open
title: AIX: test_utf8_mode.test_cmd_line fails
type: behavior
versions: Python 3.7, Python 3.8
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34347>
_______________________________________
More information about the New-bugs-announce
mailing list