On May 28, 2006, at 5:34 PM, Thomas Wouters wrote:

On 5/29/06, Bob Ippolito <bob@redivi.com> wrote:

On May 28, 2006, at 4:31 AM, Thomas Wouters wrote:
>
> I'm seeing a dubious failure of test_gzip and test_tarfile on my
> AMD64 machine. It's triggered by the recent struct changes, but I'd
> say it's probably caused by a bug/misfeature in zlibmodule:
> zlib.crc32 is the result of a zlib 'crc32' functioncall, which
> returns an unsigned long. zlib.crc32 turns that unsigned long into
> a (signed) Python int, which means a number beyond 1<<31 goes
> negative on 32-bit systems and other systems with 32-bit longs, but
> stays positive on systems with 64-bit longs:
>
> (32-bit)
> >>> zlib.crc32("foobabazr")
> -271938108
>
> (64-bit)
> >>> zlib.crc32("foobabazr")
> 4023029188
>
> The old structmodule coped with that:
> >>> struct.pack("<l", -271938108)
> '\xc4\x8d\xca\xef'
> >>> struct.pack("<l", 4023029188)
> '\xc4\x8d\xca\xef'
>
> The new one does not:
> >>> struct.pack("<l", -271938108)
> '\xc4\x8d\xca\xef'
> >>> struct.pack("<l", 4023029188)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "Lib/struct.py", line 63, in pack
>     return o.pack(*args)
> struct.error: 'l' format requires -2147483647 <= number <= 2147483647
>
> The structmodule should be fixed (and a test added ;) but I'm also
> wondering if zlib shouldn't be fixed. Now, I'm AMD64-centric, so my
> suggested fix would be to change the PyInt_FromLong() call to
> PyLong_FromUnsignedLong(), making zlib always return positive
> numbers -- it might break some code on 32-bit platforms, but that
> code is already broken on 64-bit platforms. But I guess I'm okay
> with the long being changed into an actual 32-bit signed number on
> 64-bit platforms, too.

The struct module isn't what's broken here. All of the struct types
have always had well defined bit sizes and alignment if you
explicitly specify an endian, >I and >L are 32-bits everywhere, and
>Q is supported on platforms that don't have long long. The only
thing that's changed is that it actually checks for errors
consistently now.

Yes. And that breaks things. I'm certain the behaviour is used in real-world code (and I don't mean just the gzip module.) It has always, as far as I can remember, accepted 'unsigned' values for the signed versions of ints, longs and long-longs (but not chars or shorts.) I agree that that's wrong, but I don't think changing struct to do the right thing should be done in 2.5. I don't even think it should be done in 2.6 -- although 3.0 is fine.


Well, the behavior change is in response to a bug <http://python.org/sf/1229380>. If nothing else, we should at least fix the standard library such that it doesn't depend on struct bugs. This is the only way to find them :)

Basically the struct module previously only checked for errors if you don't specify an endian. That's really strange and leads to very confusing results. The only code that really should be broken by this additional check is code that existed before Python had a long type and only signed values were available.

-bob