[Python-Dev] Difference in RE between 3.2 and 3.3 (or Aaron Swartz memorial)

Victor Stinner victor.stinner at gmail.com
Wed Mar 6 19:34:10 CET 2013


In short, Unicode was rewritten in Python 3.3 for the PEP 393. It's
not surprising that minor details like singleton differ. You should
not use "is" to compare strings in Python, or your program will fail
on other Python implementations (like PyPy, IronPython, or Jython) or
even on a different CPython version.

Anyway, you spotted a missed optimization: it's now "fixed" in Python
3.3 and 3.4 by the following commits. Copy/paste of the CIA IRC bot:

19:30 < irker555> cpython: Victor Stinner 3.3 * 82517:3dd2fa78fb89 /
                  _PyUnicode_Writer() now also reuses Unicode
singletons: empty string and latin1
                  single character http://hg.python.org/cpython/rev/3dd2fa78fb89
19:30 < irker032> cpython: Victor Stinner default * 82518:fa59a85b373f
/ Objects/unicodeobject.c:
                  (Merge 3.3) _PyUnicode_Writer() now also reuses
Unicode singletons: empty string and
                  latin1 single character


2013/3/6 Amaury Forgeot d'Arc <amauryfa at gmail.com>:
>> So, in the end, I have went the long way and bisected cpython to
>> find the commit which broke my tests, and it seems that the
>> culprit is http://hg.python.org/cpython/rev/123f2dc08b3e so it is
>> clearly something Unicode related.
>> Unfortunately, it really doesn't tell me what exactly is broken
>> (is it a known regression) and if there is known workaround.
>> Could anybody suggest a way how to find bugs on
>> http://bugs.python.org related to some particular commit (plain
>> search for 123f2dc0 didn’t find anything).
> I strongly suspect an incorrect usage of the "is" operator:
> https://github.com/mcepl/html2text/blob/master/html2text.py#L95
> Identity of strings is not guaranteed...
> Does it change something if you use "==" instead?
> --
> Amaury Forgeot d'Arc

More information about the Python-Dev mailing list