Well, we have the first 2.2 bugfix that isn't a no-brainer to port to 2.2.1. This is to do with the [ #495401 ] Build troubles: --with-pymalloc bug. As far as understand it, there were two problems. 1) with wide unicode characters, some function in unicodeobject.c to do with interpreting escape codes could write into memory it didn't own. 2) something to do with the handling of "unpaired high surrogates" in the utf-8 codec. Were these problems related? I think they got fixed at the same time, but I may have gotten confused. 1) shouldn't be too much of an issue to get into 2.2.1 (there was some contention about which fix performed better, but for 2.2.1 I don't care too much). 2) is more troublesome, because to fix it properly breaks .pycs, in turn because marshal uses the utf-8 codec to store unicode string constants, and this is a no-no according to PEP 6. Is it possible to worm around 2) by reconstructing valid strings from the bad marshal data, or has information been lost? How severe is the bug? Maybe it would be best to leave it unfixed in 2.2.1. Basically, I guess I'm saying I'm too much of a unicode dunce to understand all the issues involved in fixing this problems in 2.2, so as unofficial bugfix-porter, I'd like someone else (Marc? Martin?) to port these particular fixes. If the mechanics of fiddling with the branch is too much, sending me patches is fine. Cheers, M. -- This is the fixed point problem again; since all some implementors do is implement the compiler and libraries for compiler writing, the language becomes good at writing compilers and not much else! -- Brian Rogoff, comp.lang.functional
Michael Hudson wrote:
Well, we have the first 2.2 bugfix that isn't a no-brainer to port to 2.2.1. This is to do with the
[ #495401 ] Build troubles: --with-pymalloc
bug.
As far as understand it, there were two problems.
1) with wide unicode characters, some function in unicodeobject.c to do with interpreting escape codes could write into memory it didn't own.
2) something to do with the handling of "unpaired high surrogates" in the utf-8 codec.
Were these problems related? I think they got fixed at the same time, but I may have gotten confused.
Right. 1) was caused by 2). Both are fixed now.
1) shouldn't be too much of an issue to get into 2.2.1 (there was some contention about which fix performed better, but for 2.2.1 I don't care too much).
2) is more troublesome, because to fix it properly breaks .pycs, in turn because marshal uses the utf-8 codec to store unicode string constants, and this is a no-no according to PEP 6.
Is it possible to worm around 2) by reconstructing valid strings from the bad marshal data, or has information been lost? How severe is the bug? Maybe it would be best to leave it unfixed in 2.2.1.
Well, I posted a message to python-dev or the checkins list about this (don't remember). The situation is basically like this: In Python <= 2.2.0, you could write u = u"\uD800" in a .py file. The first time you import this file, Python will create a .pyc file for it using the broken UTF-8 encoding. The import will succeed. The second time you import the module, Python will try to use the .pyc file. Now reading that file in fails with a UnicodeError and Python also does not revert to the .py file. As a result, modules using unpaired surrogates in Unicode literals are simply broken in Python <= 2.2.0. The problem with backporting this patch is that in order for Python to properly recompile any broken module, the magic will have to be changed. Question is whether this is a reasonable thing to do in a patch level release... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
M.-A. Lemburg writes:
The problem with backporting this patch is that in order for Python to properly recompile any broken module, the magic will have to be changed. Question is whether this is a reasonable thing to do in a patch level release...
Guido can rule as he sees fit, but I don't see any reason *not* to change the magic number. This seems like a pretty important fix to me. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
Fred L. Drake, Jr. wrote:
M.-A. Lemburg writes:
The problem with backporting this patch is that in order for Python to properly recompile any broken module, the magic will have to be changed. Question is whether this is a reasonable thing to do in a patch level release...
Guido can rule as he sees fit, but I don't see any reason *not* to change the magic number. This seems like a pretty important fix to me.
The question is not whether it's an important fix, but whether the fix and its consequences are important enough to warrant changing the magic number. It's obviously possible for people to regen their .pyc files by deleting them, so I think we should wait for Guido to say "yes" before bumping the magic number, given that one of the cardinal points of the new bugfix process is that .pyc files will not be regenerated due to a bugfix release. Note carefully that I do agree that it's a serious enough issue to consider the possibility of breaking that rule, but I think we can't afford to pull the trigger without Guido's specific buy-in. We'll also need to think about how we're going to market it if we do bump the magic number. To me, then, the proper question is, "Is this an issue where *automatic* regeneration of .pyc files is sufficiently important?" (I don't know enough to have an opinion myself ;-), but I'll point out that the import failure means that at least it isn't a silent failure -- which I would absolutely agree needs a magic number bump.) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista We must not let the evil of a few trample the freedoms of the many.
Aahz Maruch wrote:
Fred L. Drake, Jr. wrote:
M.-A. Lemburg writes:
The problem with backporting this patch is that in order for Python to properly recompile any broken module, the magic will have to be changed. Question is whether this is a reasonable thing to do in a patch level release...
Guido can rule as he sees fit, but I don't see any reason *not* to change the magic number. This seems like a pretty important fix to me.
The question is not whether it's an important fix, but whether the fix and its consequences are important enough to warrant changing the magic number. It's obviously possible for people to regen their .pyc files by deleting them, so I think we should wait for Guido to say "yes" before bumping the magic number, given that one of the cardinal points of the new bugfix process is that .pyc files will not be regenerated due to a bugfix release.
We could of course ship the patch level release with the same magic number. Modules that haven't worked before will then start to work. Note that we haven't had *any* bug report directly related to this, so it's likely that noone has actually hit this bug in practice. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
"Fred L. Drake, Jr." <fdrake@acm.org> writes:
Guido can rule as he sees fit, but I don't see any reason *not* to change the magic number. This seems like a pretty important fix to me.
The memory-overwriting problem can be fixed without bumping the pyc magic. The rationale for bumping the pyc magic is pretty weak, IMO, so that aspect should not be propagated to 2.2.1. Regards, Martin
Martin v. Loewis writes:
The memory-overwriting problem can be fixed without bumping the pyc magic. The rationale for bumping the pyc magic is pretty weak, IMO, so that aspect should not be propagated to 2.2.1.
I'm happy with that. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
"M.-A. Lemburg" <mal@lemburg.com> writes:
Right. 1) was caused by 2).
That wasn't actually the case. The overwriting of memory was really independent of the error in surrogate processing, and can be fixed independently.
As a result, modules using unpaired surrogates in Unicode literals are simply broken in Python <= 2.2.0.
I think this is unimportant enough to just accept this bug for Python 2.2.x. If people ever run into the problem, well: just don't do this. Unpaired surrogates will be entirely in Unicode 3.2.
The problem with backporting this patch is that in order for Python to properly recompile any broken module, the magic will have to be changed. Question is whether this is a reasonable thing to do in a patch level release...
The memory-overwriting problem can be fixed independently, e.g. with https://sourceforge.net/tracker/download.php?group_id=5470&atid=105470&file_id=15248&aid=495401 Regards, Martin
"Martin v. Loewis" wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
Right. 1) was caused by 2).
That wasn't actually the case. The overwriting of memory was really independent of the error in surrogate processing, and can be fixed independently.
In that case, it's probably best to just use this patch and leave the UTF-8 fix in 2.3 only.
The memory-overwriting problem can be fixed independently, e.g. with
https://sourceforge.net/tracker/download.php?group_id=5470&atid=105470&file_id=15248&aid=495401
-- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
participants (5)
-
aahz@rahul.net -
Fred L. Drake, Jr. -
M.-A. Lemburg -
martin@v.loewis.de -
Michael Hudson