[pypy-issue] Issue #2441: Unicode surrogate codepoints in string literals combined when using unittest module (pypy/pypy)

byllyfish issues-reply at bitbucket.org
Tue Nov 29 16:12:39 EST 2016


New issue 2441: Unicode surrogate codepoints in string literals combined when using unittest module
https://bitbucket.org/pypy/pypy/issues/2441/unicode-surrogate-codepoints-in-string

byllyfish:

When running unit tests under PyPy3, I sometimes see unicode literals containing surrogate pairs combined into a single non-BMP character.

Here is `test_surrogate.py`:

```python
import unittest

class TestSurrogate(unittest.TestCase):
    def test_surrogate(self):
        s = '\ud800\udc00'
        if len(s) != 2:
            raise ValueError(s.encode('raw-unicode-escape'))
```

The first time I run it, it works fine.

```bash
$ ~/pypy3-v5.5.0-osx64/bin/pypy3 -m unittest test_surrogate
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
```

When I run the test a second time, it fails. The surrogate pair has been replaced.

```bash
$ ~/pypy3-v5.5.0-osx64/bin/pypy3 -m unittest test_surrogate
E
======================================================================
ERROR: test_surrogate (test_surrogate.TestSurrogate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_surrogate.py", line 7, in test_surrogate
    raise ValueError(s.encode('raw-unicode-escape'))
ValueError: b'\\U00010000'

----------------------------------------------------------------------
Ran 1 test in 0.010s

FAILED (errors=1)
```

The failures continue until I touch the file. After that, the test will succeed the first time, then fail subsequently. If I touch the file and run pypy3 with -B (don't write .py[co] files on import), all the test runs succeed.
 
N.B. The problem does NOT occur under normal conditions. I've only seen it using unittest.

```
# This small program always works fine!
s = '\ud800\udc00'
if len(s) != 2:
    raise ValueError(s.encode('raw-unicode-escape'))
```

I am running on Mac OS X 10.11.6.  Please let me know if you can reproduce this.

```
$ ~/pypy3-v5.5.0-osx64/bin/pypy3 --version
Python 3.3.5 (619c0d5af0e5, Oct 08 2016, 22:08:19)
[PyPy 5.5.0-alpha0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
```




More information about the pypy-issue mailing list