Exact integral types in struct
The struct module works only with natural platform-specific integers. As a result, a lot of code in the standard library and third-party applications are forced either to rely on unreliable assumptions (short is always 2-byte, long is always 4-byte), which is not always true, either explicitly construct the integer from bytes (b[0]+(b[1]<<8)+(b[2]<<16)+...). I propose to introduce a special notation formats for signed and unsigned integers of arbitrary exact size (which is given a number preceded by the prefix). After that, eliminate the use of platform-specific formats when working with platform-independent data (such as zip, for example). Or maybe I'm behind, and the corresponding functions already exist, and the use of the struct module is only remnants?
Using the '<' and '>' prefixes to the format string you can force standard size and alignment. Do you have a specific bit of code in the stdlib in mind that is incorrectly using native alignment? On Tue, Mar 20, 2012 at 11:26 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
The struct module works only with natural platform-specific integers. As a result, a lot of code in the standard library and third-party applications are forced either to rely on unreliable assumptions (short is always 2-byte, long is always 4-byte), which is not always true, either explicitly construct the integer from bytes (b[0]+(b[1]<<8)+(b[2]<<16)+...). I propose to introduce a special notation formats for signed and unsigned integers of arbitrary exact size (which is given a number preceded by the prefix). After that, eliminate the use of platform-specific formats when working with platform-independent data (such as zip, for example).
Or maybe I'm behind, and the corresponding functions already exist, and the use of the struct module is only remnants?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
20.03.12 20:32, Guido van Rossum написав(ла):
Using the '<' and'>' prefixes to the format string you can force standard size and alignment. Do you have a specific bit of code in the stdlib in mind that is incorrectly using native alignment?
Hmm. It seems that I have been a temporary insanity. I just now noticed that these prefixes indicate not only endianess but also size. My fault. Excuse me for undue disturbance. However, the trick with struct.unpack('dd') in Lib/json/decoder.py amazes me.
Floating points from IEEE 754 doesn't depends from machine byte order and C double is always coded in 8 bytes as I know, On Tue, Mar 20, 2012 at 9:48 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
20.03.12 20:32, Guido van Rossum написав(ла):
Using the '<' and'>' prefixes to the format string you can force standard size and alignment. Do you have a specific bit of code in the stdlib in mind that is incorrectly using native alignment?
Hmm. It seems that I have been a temporary insanity. I just now noticed that these prefixes indicate not only endianess but also size. My fault. Excuse me for undue disturbance.
However, the trick with struct.unpack('dd') in Lib/json/decoder.py amazes me.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov
20.03.12 21:54, Andrew Svetlov написав(ла):
Floating points from IEEE 754 doesn't depends from machine byte order and C double is always coded in 8 bytes as I know,
Full code: def _floatconstants(): _BYTES = binascii.unhexlify(b'7FF80000000000007FF0000000000000') if sys.byteorder != 'big': _BYTES = _BYTES[:8][::-1] + _BYTES[8:][::-1] nan, inf = struct.unpack('dd', _BYTES) return nan, inf, -inf NaN, PosInf, NegInf = _floatconstants() But in xdrlib.py: return struct.unpack('>d', data)[0] And in pickle.py: self.append(unpack('>d', self.read(8))[0]) Test:
import struct struct.pack('>d', 1) b'?\xf0\x00\x00\x00\x00\x00\x00' struct.pack('<d', 1) b'\x00\x00\x00\x00\x00\x00\xf0?'
Sorry, my fault. But as you can see json lib switches byteorder manually - so it's not an error. Obviously it will be cleaner to use direct '>d' form. Please make an issue in bugtracker if you want. On Tue, Mar 20, 2012 at 10:27 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
20.03.12 21:54, Andrew Svetlov написав(ла):
Floating points from IEEE 754 doesn't depends from machine byte order and C double is always coded in 8 bytes as I know,
Full code:
def _floatconstants(): _BYTES = binascii.unhexlify(b'7FF80000000000007FF0000000000000') if sys.byteorder != 'big': _BYTES = _BYTES[:8][::-1] + _BYTES[8:][::-1] nan, inf = struct.unpack('dd', _BYTES) return nan, inf, -inf
NaN, PosInf, NegInf = _floatconstants()
But in xdrlib.py:
return struct.unpack('>d', data)[0]
And in pickle.py:
self.append(unpack('>d', self.read(8))[0])
Test:
>>> import struct >>> struct.pack('>d', 1) b'?\xf0\x00\x00\x00\x00\x00\x00' >>> struct.pack('<d', 1) b'\x00\x00\x00\x00\x00\x00\xf0?'
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov
You can also use "3s", then int.from_bytes. Strangely, the need for finer control of struct members has never come up, I guess this is the legacy of C.
21.03.12 01:46, Matt Joiner написав(ла):
You can also use "3s", then int.from_bytes. Strangely, the need for finer control of struct members has never come up, I guess this is the legacy of C.
Thank you. I'm not very familiar with the latest API. I think that in the documentation for the struct module should clarify, what is the *standard* size and alignment. The struct module is widely used in the standard library for working with binary formats (aifc, base64, binhex, compileall, dbm, gettext, gzip, idlelib, logging, modulefinder, msilib, pickle, wave, xdrlib, zipfile).
Serhiy Storchaka wrote:
I think that in the documentation for the struct module should clarify, what is the *standard* size and alignment.
Yes, it's a bit perplexing the way it casually throws in the word "standard" without any elaboration at that point. It leaves the reader wondering -- which standard? It sounds like it's referring to some widely-recognised standard that the reader is assumed to already know about, whereas it's actually something made up for the struct module. Also I think it would help to point out that the standard is designed to be platform-independent as well as compiler-independent. One can infer this from what is said, but it wouldn't hurt to point it out explicitly. -- Greg
Le 21/03/2012 07:31, Greg Ewing a écrit :
Serhiy Storchaka wrote:
I think that in the documentation for the struct module should clarify, what is the*standard* size and alignment. Yes, it's a bit perplexing the way it casually throws in the word "standard" without any elaboration at that point. It leaves the reader wondering -- which standard? It sounds like it's referring to some widely-recognised standard that the reader is assumed to already know about, whereas it's actually something made up for the struct module.
I don’t see this problem when reading the documentation. The idea of "standard" size is introduced in section 7.3.2.1:
Standard size depends only on the format character; see the table in the Format Characters section.
The said table in the next section has a "Standard size" column. For example, the size for "@i" (native size) is variable, but "=i" (standard size) is always 4 bytes. http://docs.python.org/library/struct.html#byte-order-size-and-alignment http://docs.python.org/library/struct.html#format-characters Maybe the docs should not use the word "standard". But it is self-contained: it does not refer to an external standard. As to alignment, the table in 7.3.2.1 is pretty clear that "standard alignment" is no alignment at all. -- Simon Sapin
On 21.03.2012 13:36, Serhiy Storchaka wrote:
21.03.12 12:14, Simon Sapin написав(ла):
I don’t see this problem when reading the documentation. The idea of "standard" size is introduced in section 7.3.2.1:
Again it is all because of my carelessness. I looked ``pydoc struct``, and not a library documentation.
Well, if "pydoc struct" is not self-contained and mentions "standard size" without defining it, that is still a bug. At the very least it would have to refer to the library docs for what the standard size is. Georg
On Thu, Mar 22, 2012 at 5:07 PM, Georg Brandl <g.brandl@gmx.net> wrote:
On 21.03.2012 13:36, Serhiy Storchaka wrote:
21.03.12 12:14, Simon Sapin написав(ла):
I don’t see this problem when reading the documentation. The idea of "standard" size is introduced in section 7.3.2.1:
Again it is all because of my carelessness. I looked ``pydoc struct``, and not a library documentation.
Well, if "pydoc struct" is not self-contained and mentions "standard size" without defining it, that is still a bug. At the very least it would have to refer to the library docs for what the standard size is.
The broader question of whether the docs might be better rephrased to say "default size" rather than "standard size" still stands, though (since 'default' is a more typical word for "we defined a value that is used automatically if you don't explicitly specify an alternative" than 'standard') Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Oh, but this is NOT a default! The default is system local. Agreed on clarifying the docstring. --Guido On Thursday, March 22, 2012, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Thu, Mar 22, 2012 at 5:07 PM, Georg Brandl <g.brandl@gmx.net> wrote:
On 21.03.2012 13:36, Serhiy Storchaka wrote:
21.03.12 12:14, Simon Sapin написав(ла):
I don’t see this problem when reading the documentation. The idea of "standard" size is introduced in section 7.3.2.1:
Again it is all because of my carelessness. I looked ``pydoc struct``, and not a library documentation.
Well, if "pydoc struct" is not self-contained and mentions "standard size" without defining it, that is still a bug. At the very least it would have to refer to the library docs for what the standard size is.
The broader question of whether the docs might be better rephrased to say "default size" rather than "standard size" still stands, though (since 'default' is a more typical word for "we defined a value that is used automatically if you don't explicitly specify an alternative" than 'standard')
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
participants (8)
-
Andrew Svetlov
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Matt Joiner
-
Nick Coghlan
-
Serhiy Storchaka
-
Simon Sapin