
I submitted a number of patches which fixes currently broken Unicode-disabled build of Python 2.7 (built with --disable-unicode configure option). I suppose this was broken in 2.7 when C implementation of the io module was introduced. http://bugs.python.org/issue21833 -- main patch which fixes the io module and adds helpers for testing. http://bugs.python.org/issue21834 -- a lot of minor fixes for tests. Following issues fix different modules and related tests: http://bugs.python.org/issue21854 -- cookielib http://bugs.python.org/issue21838 -- ctypes http://bugs.python.org/issue21855 -- decimal http://bugs.python.org/issue21839 -- distutils http://bugs.python.org/issue21843 -- doctest http://bugs.python.org/issue21851 -- gettext http://bugs.python.org/issue21844 -- HTMLParser http://bugs.python.org/issue21850 -- httplib and SimpleHTTPServer http://bugs.python.org/issue21842 -- IDLE http://bugs.python.org/issue21853 -- inspect http://bugs.python.org/issue21848 -- logging http://bugs.python.org/issue21849 -- multiprocessing http://bugs.python.org/issue21852 -- optparse http://bugs.python.org/issue21840 -- os.path http://bugs.python.org/issue21845 -- plistlib http://bugs.python.org/issue21836 -- sqlite3 http://bugs.python.org/issue21837 -- tarfile http://bugs.python.org/issue21835 -- Tkinter http://bugs.python.org/issue21847 -- xmlrpc http://bugs.python.org/issue21841 -- xml.sax http://bugs.python.org/issue21846 -- zipfile Most fixes are trivial and are only several lines of a code.

Hi, I don't know anyone building Python without Unicode. I would prefer to modify configure to raise an error, and drop #ifdef in the code. (Stop supporting building Python 2 without Unicode.) Building Python 2 without Unicode support is not an innocent change. Python is moving strongly to Unicode: Python 3 uses Unicode by default. So to me it sounds really weird to work on building Python 2 without Unicode support. It means that you may have "Python 2" and "Python 2 without Unicode" which are not exactly the same language. IMO u"unicode" is part of the Python 2 language. --disable-unicode is an old option added while Python 1.5 was very slowly moving to Unicode. I have the same opinion on --without-thread option (we should stop supporting it, this option is useless). I worked in the embedded world, Python used for the UI of a TV set top box. Even if the hardware was slow and old, Python was compiled with threads and Unicode. Unicode was mandatory to handle correctly letters with diacritics, threads were used to handle network and D-Bus for examples. Victor 2014-06-24 10:22 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:

I can't see any reason to make a backwards-incompatible change to Python 2 to only support Unicode. You're bound to break somebody's setup. Wouldn't it be better to fix bugs as Serhiy has done? Skip

2014-06-24 13:04 GMT+02:00 Skip Montanaro <skip@pobox.com>:
According to the long list of issues, I don't think that it's possible to compile and use Python stdlib when Python is compiled without Unicode support. So I'm not sure that we can say that it's an backward-incompatible change. Who is somebody? Who compiles Python without Unicode support? Which version of Python? With Python 2.6, ./configure --disable-unicode fails with: "checking what type to use for unicode... configure: error: invalid value for --enable-unicode. Use either ucs2 or ucs4 (lowercase)." So I'm not sure that anyone used this option recently. The configure script was fixed 2 years ago in Python 2.7 (2 years after the release of Python 2.7.0): http://hg.python.org/cpython/rev/d7aff4423172 http://bugs.python.org/issue21833 "./configure --disable-unicode" works on Python 2.5.6: unicode type doesn't exist, and u'abc' is a bytes string. It works with Python 2.7.7+ too. Victor

24.06.14 14:50, Victor Stinner написав(ла):
Python has about 300 modules, my patches fix about 30 modules (only 8 of them cause compiling error). And that's almost all. Left only pickle, json, etree, email and unicode-specific modules (codecs, unicodedata and encodings). Besides pickle I'm not sure that others can be fixed. The fact that only small fraction of modules needs fixes means that Python without unicode support can be pretty usable. The main problem was with testing itself. Test suite depends on tempfile, which now uses io.open, which didn't work without unicode support (at least since 2.7).

In article <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com>, Benjamin Peterson <benjamin@python.org> wrote:
That's why I'm concerned about applying these 20+ patches that touch many parts of the code base. I don't have any objection to the "arcane feature" per se and I appreciate the obvious effort that Serhiy put into the patches but, at this stage of the life of Python 2, our overriding concern should be stability. That's really why most users of Python 2.7 continue to use it. As I see it, maintenance mode is a promise from us to our users that we will try our best, in general, to only make changes that fix serious problems, either due to bugs in Python itself or changes in the external world (new OS releases, etc). We don't automatically fix all bugs. Any time we make a change, we're making an engineering decision with cost-benefit tradeoffs. The more lines of code changed, the greater the risk that we introduce new bugs; inadvertently adding regressions has been an issue over a number of the 2.7.x releases, including the most recent one. The cost-benefit of this set of changes seems to me to be: Costs: - Code changes in many modules: - careful review -> additional work for multiple core developers - careful testing on all platforms including this option that we don't currently test at all, AFAIK -> added work for platform experts - risk of regressions not caught prior to release, at worst requiring another early followup release -> added work for release team, third-party packagers, users - possibly making backporting of other issues more difficult due to merge conflicts - possible invalidation of waiting-for-review patches forcing patch refreshes and retests -> added work for potential contributors - possible invalidation of user local patches -> added work for users - may encourage use of an apparently little-used feature that has no equivalent in Python 3, another incentive to stay with Py2? Benefit: - Fixes documented feature that may be of benefit to users of Python in applications with very limited memory available, although there aren't any open issues from users requesting this (AFAIK). No benefit to the overwhelming majority of Python users, who only use Unicode-enabled builds. That just doesn't seem like a good trade-off to me. I'll certainly abide by the release manager's decision but I think we all need to be thinking more about these kinds of cost-benefit tradeoffs and recognize that there are often non-obvious costs of making changes, costs that can affect our entire community. Yes, we are committed to maintaining Python 2.7 for multiple years but that doesn't mean we have to fix every open issue or even most open issues. Any or all of the above costs may apply to any changes we make. For many of our users, the best maintenance policy for Python 2.7 would be the least change possible. -- Ned Deily, nad@acm.org

On 25 Jun 2014 07:05, "Ethan Furman" <ethan@stoneleaf.us> wrote:
it. If a bug has been there for a while, the affected users are probably working around it by now. ;) Aye, in this case, I'm in the "officially deprecate the feature" camp. Don't actively try to break it further, just slap a warning in the docs to say it is no longer a supported configuration. In my own personal case, I not only wasn't aware that there was still an option to turn off the Unicode support, but I also wouldn't really class a build with it turned off as still being Python. As Jim noted, there are quite a lot of APIs that don't make sense if there's no Unicode type available. Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

25.06.14 16:29, Victor Stinner написав(ла):
In posixpath branches for unicode and str should be reversed. In multiprocessing .encode('utf-8') is applied on utf-8 encoded str (this is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember.

On 26 Jun 2014 01:13, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember. OK, *that* sounds like an excellent reason to keep the Unicode disabled builds functional, and make sure they stay that way with a buildbot: to help make sure we're not accidentally running afoul of the implicit interoperability between str and unicode when backporting fixes from Python 3. Helping to ensure correct handling of str values makes this capability something of benefit to *all* Python 2 users, not just those that turn off the Unicode support. It also makes it a potentially useful testing tool when assessing str/unicode handling in general. Regards, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Le 25/06/2014 19:28, Nick Coghlan a écrit :
Hmmm... From my perspective, trying to enforce unicode-disabled builds will only lower the (already low) chance that I may want to write / backport bug fixes for 2.7. For the same reason, I agree with Victor that we should ditch the threading-disabled builds. It's too much of a hassle for no actual, practical benefit. People who want a threadless unicodeless Python can install Python 1.5.2 for all I care. Regards Antoine.

On Thu, Jun 26, 2014 at 9:04 PM, Antoine Pitrou <antoine@python.org> wrote:
Or some other implementation of Python. It's looking like micropython will be permanently supporting a non-Unicode build (although I stepped away from the project after a strong disagreement over what would and would not make sense, and haven't been following it since). If someone wants a Python that doesn't have stuff that the core CPython devs treat as essential, s/he probably wants something like uPy anyway. ChrisA

Hello, On Thu, 26 Jun 2014 22:49:40 +1000 Chris Angelico <rosuav@gmail.com> wrote:
Yes.
Your patches with my further additions were finally merged. Unicode strings still cannot be enabled by default due to https://github.com/micropython/micropython/issues/726 . Any help with reviewing/testing what's currently available is welcome.
I hinted it during previous discussions of MicroPython, and would like to say it again, that MicroPython already embraced a lot of ideas rejected from CPython, like GC-only operation (which alone not something to be proud of, but can you start up and do something in 2K heap?) or tagged pointers (https://mail.python.org/pipermail/python-dev/2004-July/046139.html). So, it should be good vehicle to try any unorthodox ideas(*) or implementations. * MicroPython already implements intra-module constants for example. -- Best regards, Paul mailto:pmiscml@gmail.com

2014-06-26 13:04 GMT+02:00 Antoine Pitrou <antoine@python.org>:
By the way, adding a buildbot for testing Python without thread support is not enough. The buildbot is currently broken since more than one month and nobody noticed :-p http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%2... Ok, I noticed, but I consider that I spent too much time on this minor use case. I prefer to leave such task to someone else :-) Victor

On Sat, Jun 28, 2014 at 2:51 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
I've opened http://bugs.python.org/issue21755 to fix the test a couple of weeks ago. --Berker

On 6/24/2014 4:22 AM, Serhiy Storchaka wrote:
It has frequently been broken. Without a buildbot, it will continue to break. I have given at least a quick look at all your proposed changes; most are fixes to test code, such as skip decorators. People checked in tests without the right guards because it did work on their own builds, and on all stable buildbots. That will probably continue to happen unless/until a --disable-unicode buildbot is added. It would be good to fix the tests (and actual library issues). Unfortunately, some of the specifically proposed changes (such as defining and using _unicode instead of unicode within python code) look to me as though they would trigger problems in the normal build (where the unicode object *does* exist, but would no longer be used). Other changes, such as the use of \x escapes, appear correct, but make the tests harder to read -- and might end up removing a test for correct unicode funtionality across different spellings. Even if we assume that the tests are fine, and I'm just an idiot who misread them, the fact that there is any confusion means that these particular changes may be tricky enough to be for a bad tradeoff for 2.7. It *might* work if you could make a more focused change. For example, instead of leaving the 'unicode' name unbound, provide an object that simply returns false for isinstance and raises a UnicodeError for any other method call. Even *this* might be too aggressive to 2.7, but the fact that it would only appear in the --disable-unicode builds, and would make them more similar to the regular build are points in its favor. Before doing that, though, please document what the --disable-unicode mode is actually *supposed* to do when interacting with byte-streams that a standard defines as UTF-8. (For example, are the changes to _xml_dumps and _xml_loads at http://bugs.python.org/file35758/multiprocessing.patch correct, or do those functions assume they get bytes as input, or should the functions raise an exception any time they are called?) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

25.06.14 00:03, Jim J. Jewett написав(ла):
This is recomended by MvL [1] and widely used (19 times in source code) idiom. [1] http://bugs.python.org/issue8767#msg159473
No, existing code use different approach. "unicode" doesn't exist, while encode/decode methods exist but are useless. If my memory doesn't fail me, there is even special explanatory comment about this historical decision somewhere. This decision was made many years ago.
Looking more carefully, I see that there is a bug in unicode-enable build (wrong backporting from 3.x). In 2.x xmlrpclib.dumps produces already utf-8 encoded string, in 3.x xmlrpc.client.dumps produces unicode string. multiprocessing should fail with non-ascii str or unicode. Side benefit of my patches is that they expose existing errors in unicode-enable build.

Hi, I don't know anyone building Python without Unicode. I would prefer to modify configure to raise an error, and drop #ifdef in the code. (Stop supporting building Python 2 without Unicode.) Building Python 2 without Unicode support is not an innocent change. Python is moving strongly to Unicode: Python 3 uses Unicode by default. So to me it sounds really weird to work on building Python 2 without Unicode support. It means that you may have "Python 2" and "Python 2 without Unicode" which are not exactly the same language. IMO u"unicode" is part of the Python 2 language. --disable-unicode is an old option added while Python 1.5 was very slowly moving to Unicode. I have the same opinion on --without-thread option (we should stop supporting it, this option is useless). I worked in the embedded world, Python used for the UI of a TV set top box. Even if the hardware was slow and old, Python was compiled with threads and Unicode. Unicode was mandatory to handle correctly letters with diacritics, threads were used to handle network and D-Bus for examples. Victor 2014-06-24 10:22 GMT+02:00 Serhiy Storchaka <storchaka@gmail.com>:

I can't see any reason to make a backwards-incompatible change to Python 2 to only support Unicode. You're bound to break somebody's setup. Wouldn't it be better to fix bugs as Serhiy has done? Skip

2014-06-24 13:04 GMT+02:00 Skip Montanaro <skip@pobox.com>:
According to the long list of issues, I don't think that it's possible to compile and use Python stdlib when Python is compiled without Unicode support. So I'm not sure that we can say that it's an backward-incompatible change. Who is somebody? Who compiles Python without Unicode support? Which version of Python? With Python 2.6, ./configure --disable-unicode fails with: "checking what type to use for unicode... configure: error: invalid value for --enable-unicode. Use either ucs2 or ucs4 (lowercase)." So I'm not sure that anyone used this option recently. The configure script was fixed 2 years ago in Python 2.7 (2 years after the release of Python 2.7.0): http://hg.python.org/cpython/rev/d7aff4423172 http://bugs.python.org/issue21833 "./configure --disable-unicode" works on Python 2.5.6: unicode type doesn't exist, and u'abc' is a bytes string. It works with Python 2.7.7+ too. Victor

24.06.14 14:50, Victor Stinner написав(ла):
Python has about 300 modules, my patches fix about 30 modules (only 8 of them cause compiling error). And that's almost all. Left only pickle, json, etree, email and unicode-specific modules (codecs, unicodedata and encodings). Besides pickle I'm not sure that others can be fixed. The fact that only small fraction of modules needs fixes means that Python without unicode support can be pretty usable. The main problem was with testing itself. Test suite depends on tempfile, which now uses io.open, which didn't work without unicode support (at least since 2.7).

In article <1403625970.6550.133062453.693ECDEA@webmail.messagingengine.com>, Benjamin Peterson <benjamin@python.org> wrote:
That's why I'm concerned about applying these 20+ patches that touch many parts of the code base. I don't have any objection to the "arcane feature" per se and I appreciate the obvious effort that Serhiy put into the patches but, at this stage of the life of Python 2, our overriding concern should be stability. That's really why most users of Python 2.7 continue to use it. As I see it, maintenance mode is a promise from us to our users that we will try our best, in general, to only make changes that fix serious problems, either due to bugs in Python itself or changes in the external world (new OS releases, etc). We don't automatically fix all bugs. Any time we make a change, we're making an engineering decision with cost-benefit tradeoffs. The more lines of code changed, the greater the risk that we introduce new bugs; inadvertently adding regressions has been an issue over a number of the 2.7.x releases, including the most recent one. The cost-benefit of this set of changes seems to me to be: Costs: - Code changes in many modules: - careful review -> additional work for multiple core developers - careful testing on all platforms including this option that we don't currently test at all, AFAIK -> added work for platform experts - risk of regressions not caught prior to release, at worst requiring another early followup release -> added work for release team, third-party packagers, users - possibly making backporting of other issues more difficult due to merge conflicts - possible invalidation of waiting-for-review patches forcing patch refreshes and retests -> added work for potential contributors - possible invalidation of user local patches -> added work for users - may encourage use of an apparently little-used feature that has no equivalent in Python 3, another incentive to stay with Py2? Benefit: - Fixes documented feature that may be of benefit to users of Python in applications with very limited memory available, although there aren't any open issues from users requesting this (AFAIK). No benefit to the overwhelming majority of Python users, who only use Unicode-enabled builds. That just doesn't seem like a good trade-off to me. I'll certainly abide by the release manager's decision but I think we all need to be thinking more about these kinds of cost-benefit tradeoffs and recognize that there are often non-obvious costs of making changes, costs that can affect our entire community. Yes, we are committed to maintaining Python 2.7 for multiple years but that doesn't mean we have to fix every open issue or even most open issues. Any or all of the above costs may apply to any changes we make. For many of our users, the best maintenance policy for Python 2.7 would be the least change possible. -- Ned Deily, nad@acm.org

On 25 Jun 2014 07:05, "Ethan Furman" <ethan@stoneleaf.us> wrote:
it. If a bug has been there for a while, the affected users are probably working around it by now. ;) Aye, in this case, I'm in the "officially deprecate the feature" camp. Don't actively try to break it further, just slap a warning in the docs to say it is no longer a supported configuration. In my own personal case, I not only wasn't aware that there was still an option to turn off the Unicode support, but I also wouldn't really class a build with it turned off as still being Python. As Jim noted, there are quite a lot of APIs that don't make sense if there's no Unicode type available. Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

25.06.14 16:29, Victor Stinner написав(ла):
In posixpath branches for unicode and str should be reversed. In multiprocessing .encode('utf-8') is applied on utf-8 encoded str (this is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember.

On 26 Jun 2014 01:13, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
is unicode string in Python 3). And there is similar error in at least one other place. Tests for bytearray actually test bytes, not bytearray. That is what I remember. OK, *that* sounds like an excellent reason to keep the Unicode disabled builds functional, and make sure they stay that way with a buildbot: to help make sure we're not accidentally running afoul of the implicit interoperability between str and unicode when backporting fixes from Python 3. Helping to ensure correct handling of str values makes this capability something of benefit to *all* Python 2 users, not just those that turn off the Unicode support. It also makes it a potentially useful testing tool when assessing str/unicode handling in general. Regards, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Le 25/06/2014 19:28, Nick Coghlan a écrit :
Hmmm... From my perspective, trying to enforce unicode-disabled builds will only lower the (already low) chance that I may want to write / backport bug fixes for 2.7. For the same reason, I agree with Victor that we should ditch the threading-disabled builds. It's too much of a hassle for no actual, practical benefit. People who want a threadless unicodeless Python can install Python 1.5.2 for all I care. Regards Antoine.

On Thu, Jun 26, 2014 at 9:04 PM, Antoine Pitrou <antoine@python.org> wrote:
Or some other implementation of Python. It's looking like micropython will be permanently supporting a non-Unicode build (although I stepped away from the project after a strong disagreement over what would and would not make sense, and haven't been following it since). If someone wants a Python that doesn't have stuff that the core CPython devs treat as essential, s/he probably wants something like uPy anyway. ChrisA

Hello, On Thu, 26 Jun 2014 22:49:40 +1000 Chris Angelico <rosuav@gmail.com> wrote:
Yes.
Your patches with my further additions were finally merged. Unicode strings still cannot be enabled by default due to https://github.com/micropython/micropython/issues/726 . Any help with reviewing/testing what's currently available is welcome.
I hinted it during previous discussions of MicroPython, and would like to say it again, that MicroPython already embraced a lot of ideas rejected from CPython, like GC-only operation (which alone not something to be proud of, but can you start up and do something in 2K heap?) or tagged pointers (https://mail.python.org/pipermail/python-dev/2004-July/046139.html). So, it should be good vehicle to try any unorthodox ideas(*) or implementations. * MicroPython already implements intra-module constants for example. -- Best regards, Paul mailto:pmiscml@gmail.com

2014-06-26 13:04 GMT+02:00 Antoine Pitrou <antoine@python.org>:
By the way, adding a buildbot for testing Python without thread support is not enough. The buildbot is currently broken since more than one month and nobody noticed :-p http://buildbot.python.org/all/builders/AMD64%20Fedora%20without%20threads%2... Ok, I noticed, but I consider that I spent too much time on this minor use case. I prefer to leave such task to someone else :-) Victor

On Sat, Jun 28, 2014 at 2:51 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
I've opened http://bugs.python.org/issue21755 to fix the test a couple of weeks ago. --Berker

On 6/24/2014 4:22 AM, Serhiy Storchaka wrote:
It has frequently been broken. Without a buildbot, it will continue to break. I have given at least a quick look at all your proposed changes; most are fixes to test code, such as skip decorators. People checked in tests without the right guards because it did work on their own builds, and on all stable buildbots. That will probably continue to happen unless/until a --disable-unicode buildbot is added. It would be good to fix the tests (and actual library issues). Unfortunately, some of the specifically proposed changes (such as defining and using _unicode instead of unicode within python code) look to me as though they would trigger problems in the normal build (where the unicode object *does* exist, but would no longer be used). Other changes, such as the use of \x escapes, appear correct, but make the tests harder to read -- and might end up removing a test for correct unicode funtionality across different spellings. Even if we assume that the tests are fine, and I'm just an idiot who misread them, the fact that there is any confusion means that these particular changes may be tricky enough to be for a bad tradeoff for 2.7. It *might* work if you could make a more focused change. For example, instead of leaving the 'unicode' name unbound, provide an object that simply returns false for isinstance and raises a UnicodeError for any other method call. Even *this* might be too aggressive to 2.7, but the fact that it would only appear in the --disable-unicode builds, and would make them more similar to the regular build are points in its favor. Before doing that, though, please document what the --disable-unicode mode is actually *supposed* to do when interacting with byte-streams that a standard defines as UTF-8. (For example, are the changes to _xml_dumps and _xml_loads at http://bugs.python.org/file35758/multiprocessing.patch correct, or do those functions assume they get bytes as input, or should the functions raise an exception any time they are called?) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

25.06.14 00:03, Jim J. Jewett написав(ла):
This is recomended by MvL [1] and widely used (19 times in source code) idiom. [1] http://bugs.python.org/issue8767#msg159473
No, existing code use different approach. "unicode" doesn't exist, while encode/decode methods exist but are useless. If my memory doesn't fail me, there is even special explanatory comment about this historical decision somewhere. This decision was made many years ago.
Looking more carefully, I see that there is a bug in unicode-enable build (wrong backporting from 3.x). In 2.x xmlrpclib.dumps produces already utf-8 encoded string, in 3.x xmlrpc.client.dumps produces unicode string. multiprocessing should fail with non-ascii str or unicode. Side benefit of my patches is that they expose existing errors in unicode-enable build.
participants (13)
-
Antoine Pitrou
-
Benjamin Peterson
-
Berker Peksağ
-
Chris Angelico
-
Ethan Furman
-
Jim J. Jewett
-
Ned Deily
-
Nick Coghlan
-
Paul Sokolovsky
-
Serhiy Storchaka
-
Skip Montanaro
-
Terry Reedy
-
Victor Stinner