[Python-Dev] cpython: Issue #16218: skip test if filesystem doesn't support required encoding

R. David Murray rdmurray at bitdance.com
Thu Nov 8 00:08:45 CET 2012


On Wed, 07 Nov 2012 23:47:13 +0100, Victor Stinner <victor.stinner at gmail.com> wrote:
> 2012/11/7 Alexandre Vassalotti <alexandre at peadrop.com>:
> > The Unicode code points in the U+DC00-DFFF range (low surrogate area) can't
> > be encoded in UTF-8. Quoting from RFC 3629:
> >
> > The definition of UTF-8 prohibits encoding character numbers between U+D800
> > and U+DFFF, which are reserved for use with the UTF-16 encoding form (as
> > surrogate pairs) and do not directly represent characters.
> >
> >
> > It looks like this test was doing something specific with regards to this.
> > So, I am curious as well about this change.
> 
> os.fsencode() uses the surrogateescape error handler (PEP 393) on UNIX.
> 
> >>> os.fsencode('\udcf1\udcea\udcf0\udce8\udcef\udcf2')
> b'\xf1\xea\xf0\xe8\xef\xf2'
> 
> I replaced this arbitrary string (and other similar constant strings)
> with support.FS_NONASCII which is more portable (should be available
> on all locale encodings... except ASCII) and documented.
> 
> I rewrote test_cmd_line_script.test_non_ascii() (and other tests) in
> Python 3.4 to use support.FS_NONASCII.
> 
> This change should improve code coverage on heterogeneous environments.

Alexandre's point was that the string did not appear to be arbitrary,
but rather appeared to specifically be a string containing surrogates.
Is this not the case?

--David


More information about the Python-Dev mailing list