[Python-Dev] test_gzip/test_tarfile failure om AMD64

Bob Ippolito bob at redivi.com
Mon May 29 11:56:14 CEST 2006


On May 28, 2006, at 5:34 PM, Thomas Wouters wrote:

> On 5/29/06, Bob Ippolito <bob at redivi.com> wrote:
>
> On May 28, 2006, at 4:31 AM, Thomas Wouters wrote:
> >
> > I'm seeing a dubious failure of test_gzip and test_tarfile on my
> > AMD64 machine. It's triggered by the recent struct changes, but I'd
> > say it's probably caused by a bug/misfeature in zlibmodule:
> > zlib.crc32 is the result of a zlib 'crc32' functioncall, which
> > returns an unsigned long. zlib.crc32 turns that unsigned long into
> > a (signed) Python int, which means a number beyond 1<<31 goes
> > negative on 32-bit systems and other systems with 32-bit longs, but
> > stays positive on systems with 64-bit longs:
> >
> > (32-bit)
> > >>> zlib.crc32("foobabazr")
> > -271938108
> >
> > (64-bit)
> > >>> zlib.crc32("foobabazr")
> > 4023029188
> >
> > The old structmodule coped with that:
> > >>> struct.pack("<l", -271938108)
> > '\xc4\x8d\xca\xef'
> > >>> struct.pack("<l", 4023029188)
> > '\xc4\x8d\xca\xef'
> >
> > The new one does not:
> > >>> struct.pack("<l", -271938108)
> > '\xc4\x8d\xca\xef'
> > >>> struct.pack("<l", 4023029188)
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "Lib/struct.py", line 63, in pack
> >     return o.pack(*args)
> > struct.error: 'l' format requires -2147483647 <= number <=  
> 2147483647
> >
> > The structmodule should be fixed (and a test added ;) but I'm also
> > wondering if zlib shouldn't be fixed. Now, I'm AMD64-centric, so my
> > suggested fix would be to change the PyInt_FromLong() call to
> > PyLong_FromUnsignedLong(), making zlib always return positive
> > numbers -- it might break some code on 32-bit platforms, but that
> > code is already broken on 64-bit platforms. But I guess I'm okay
> > with the long being changed into an actual 32-bit signed number on
> > 64-bit platforms, too.
>
> The struct module isn't what's broken here. All of the struct types
> have always had well defined bit sizes and alignment if you
> explicitly specify an endian, >I and >L are 32-bits everywhere, and
> >Q is supported on platforms that don't have long long. The only
> thing that's changed is that it actually checks for errors
> consistently now.
>
> Yes. And that breaks things. I'm certain the behaviour is used in  
> real-world code (and I don't mean just the gzip module.) It has  
> always, as far as I can remember, accepted 'unsigned' values for  
> the signed versions of ints, longs and long-longs (but not chars or  
> shorts.) I agree that that's wrong, but I don't think changing  
> struct to do the right thing should be done in 2.5. I don't even  
> think it should be done in 2.6 -- although 3.0 is fine.
>

Well, the behavior change is in response to a bug <http://python.org/ 
sf/1229380>. If nothing else, we should at least fix the standard  
library such that it doesn't depend on struct bugs. This is the only  
way to find them :)

Basically the struct module previously only checked for errors if you  
don't specify an endian. That's really strange and leads to very  
confusing results. The only code that really should be broken by this  
additional check is code that existed before Python had a long type  
and only signed values were available.

-bob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060529/702d1314/attachment-0001.htm 


More information about the Python-Dev mailing list