[ python-Feature Requests-1023290 ] proposed struct module format code addition

SourceForge.net noreply at sourceforge.net
Tue Sep 7 02:02:28 CEST 2004


Feature Requests item #1023290, was opened at 2004-09-06 15:42
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Josiah Carlson (josiahcarlson)
>Assigned to: Nobody/Anonymous (nobody)
Summary: proposed struct module format code addition

Initial Comment:
I believe there should be a mechanism to load and
unload arbitrarily large integers via the struct
module.  Currently, one would likely start with the 'Q'
format character, creating the integer in a block-wise
fashion with multiplies and shifts.

This is OK, though it tends to lend itself to certain
kinds of bugs.

There is currently another method for getting large
integers from strings and going back without the struct
module:

long(stri.encode('hex'), 16)
hex(inte)[2:].decode('hex')


Arguably, such things shouldn't be done for the packing
and unpacking of binary data in general (the string
slicing especially).


I propose a new format character for the struct module,
specifically because the struct module is to "Interpret
strings as packed binary data".  Perhaps 'g' and 'G'
(eg. biGint) is sufficient, though any reasonable
character should suffice.  Endianness should be
handled, and the number of bytes representing the
object would be the same as with the 's' formatting
code.  That is, '>60G' would be an unsigned big-endian
integer represented by 60 bytes (null filled if the
magnitude of the passed integer is not large enough).

The only reason why one wouldn't want this
functionality in the struct module is "This module
performs conversions between Python values and C
structs represented as Python strings." and arbitrarily
large integers are not traditionally part of a C struct
(though I am sure many of us have implemented arbitrary
precision integers with structs).  The reason "not a C
type" has been used to quash the 'bit' and 'nibble'
format character, because "masks and shifts" are able
to emulate them, and though "masks and shifts" could
also be used here, I have heard myself and others state
that there should be an easy method for converting
between large longs and strings.


A side-effect for allowing arbitrarily large integers
to be represented in this fashion is that its
functionality could, if desired, subsume the other
integer type characters, as well as fill in the gaps
for nonstandard size integers (3, 5, 6, 7 etc. byte
integers), that I (and I am sure others) have used in
various applications.


Currently no implementation exists, and I don't have
time to do one now.  Having taken a look at
longobject.c and structmodule.c, I would likely be able
to make a patch to the documentation, structmodule.c,
and test_struct.py around mid October, if this
functionality is desireable to others and accepted. 
While I doubt that a PEP for this is required, if
necessary I would write one up with a sample
implementation around mid October.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-09-06 19:02

Message:
Logged In: YES 
user_id=80475

Okay, submit a patch with docs and unittests.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-09-06 18:44

Message:
Logged In: YES 
user_id=341410

As I provide in the feature request, there is already a
method for translating string <-> long.

The problem with current methods for converting between
large integers and strings is that they do not lend
themselves to generally being understandable or to being
documented.

The struct module already provides two appropriate functions
for handling packed binary data, a place for documenting
functions involving packing and unpacking binary data, and
whose implementation seems to be simple enough (one more
format character, much of which borrowed from 's' character,
and a call to _PyLong_FromByteArray seems to be sufficient).

As for the binascii module, many of the functions listed
seem like they should be wrapped into the encode/decode
string methods, hexlify already being so in str.encode('hex').

To me, just being able to translate doesn't seem sufficient
(we already can translate), but being able to do it well,
have it documented well, and placed in a location that is
obvious, fast and optimized for these kinds of things seems
to be the right thing.

>From what I can tell, the only reason why struct doesn't
already have an equivalent format character to the proposed
'g' and 'G', is because the module was created to handle
packed C structs and seemingly "nothing else".  Considering
there doesn't seem to be any other reasonable or easily
documentable location for placing equivalent functionality
(both packing and unpacking), I am of the opinion that
restricting the packing and unpacking to C types in the
struct module (when there are other useful types) is overkill.

As I said, I will provide an implementation if desired.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-09-06 17:34

Message:
Logged In: YES 
user_id=80475

FWIW, I'm working  str/long conversion functions for the
binascii module.  Will that suit your needs?

The tolong function is equivalent to:
    long(hexlify(b), 16)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470


More information about the Python-bugs-list mailing list