[ python-Feature Requests-1023290 ] proposed struct module format code addition

SourceForge.net noreply at sourceforge.net
Sun Sep 12 05:40:51 CEST 2004


Feature Requests item #1023290, was opened at 2004-09-06 16:42
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Josiah Carlson (josiahcarlson)
Assigned to: Nobody/Anonymous (nobody)
Summary: proposed struct module format code addition

Initial Comment:
I believe there should be a mechanism to load and
unload arbitrarily large integers via the struct
module.  Currently, one would likely start with the 'Q'
format character, creating the integer in a block-wise
fashion with multiplies and shifts.

This is OK, though it tends to lend itself to certain
kinds of bugs.

There is currently another method for getting large
integers from strings and going back without the struct
module:

long(stri.encode('hex'), 16)
hex(inte)[2:].decode('hex')


Arguably, such things shouldn't be done for the packing
and unpacking of binary data in general (the string
slicing especially).


I propose a new format character for the struct module,
specifically because the struct module is to "Interpret
strings as packed binary data".  Perhaps 'g' and 'G'
(eg. biGint) is sufficient, though any reasonable
character should suffice.  Endianness should be
handled, and the number of bytes representing the
object would be the same as with the 's' formatting
code.  That is, '>60G' would be an unsigned big-endian
integer represented by 60 bytes (null filled if the
magnitude of the passed integer is not large enough).

The only reason why one wouldn't want this
functionality in the struct module is "This module
performs conversions between Python values and C
structs represented as Python strings." and arbitrarily
large integers are not traditionally part of a C struct
(though I am sure many of us have implemented arbitrary
precision integers with structs).  The reason "not a C
type" has been used to quash the 'bit' and 'nibble'
format character, because "masks and shifts" are able
to emulate them, and though "masks and shifts" could
also be used here, I have heard myself and others state
that there should be an easy method for converting
between large longs and strings.


A side-effect for allowing arbitrarily large integers
to be represented in this fashion is that its
functionality could, if desired, subsume the other
integer type characters, as well as fill in the gaps
for nonstandard size integers (3, 5, 6, 7 etc. byte
integers), that I (and I am sure others) have used in
various applications.


Currently no implementation exists, and I don't have
time to do one now.  Having taken a look at
longobject.c and structmodule.c, I would likely be able
to make a patch to the documentation, structmodule.c,
and test_struct.py around mid October, if this
functionality is desireable to others and accepted. 
While I doubt that a PEP for this is required, if
necessary I would write one up with a sample
implementation around mid October.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2004-09-11 23:40

Message:
Logged In: YES 
user_id=31435

binascii makes sense because that's where the hexlify and 
unhexlify functions live, which are small conceptual steps 
away from what's needed here.

Methods on numbers make sense too, and only seem strange 
because so few are clearly visible now (although, e.g., there 
are lots of them already, like number.__abs__ and 
number.__add__).

The struct module makes sense too, although it would be 
darned ugly to document a refusal to accept the new codes 
in "native" mode; and struct has a high learning curve; and 
struct obviously never intended to support types that aren't 
supplied directly by C compilers (the "Pascal string" code 
seems weird now, but not at the time).

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-09-10 19:18

Message:
Logged In: YES 
user_id=341410

(sorry it took me a few days to get back to you, I am on a
contract deadline crunch...just taking a break now)

The *HTTPServer heirarchy is interesting in its own right,
but really, each piece in the heirarchy adds functionality.
 A similar thing can be said of asyncore and all the modules
that derive from it (asynchat, *HTTPServer, *XMLRPCServer,
smtpd, etc.).

In this case, since the struct module is already in C and
the functions are not subclassable, creating another module
that parses strings and sends pieces off to struct for
actual decoding seems like a waste of a module, especially
when the change is so minor.  Now, binascii is being used in
such a fashion by uu and binhex, but that is because
binascii is the data processing component, where uu and
binhex make a 'pretty' interface.  Struct doesn't need a
pretty interface, it is already pretty.  Though as I have
said before, I think it could use this small addition.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-09-08 18:53

Message:
Logged In: YES 
user_id=21627

Since you were asking: it is quite common that modules refer
to related functionality. For example, BaseHTTPServer refers
to SimpleHTTPServer and CGIHTTPServer. One might expect that
a HTTP server also supports files and does CGI - but not
this one; go elsewhere. Likewise, module binascii refers to
modules uu and binhex. The math documentation points out
that it does not support complex numbers, and that cmath is
needed. The audioop documentation gives the function
echocancel in the documentation, instead of implementing it.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-09-08 18:38

Message:
Logged In: YES 
user_id=341410

Martin, I was typing as you submitted your most recent comment.

I am honestly shocked that you would suggest that longs
should gain a method for encoding themselves as binary
strings.  Such a thing would then suggest that standard ints
and floats also gain such methods.  It would also imply that
since one can go to strings, one should equivalently be able
to come from strings via equivalent methods.  Goodness,
int.tostring(width) and int.fromstring(str)?  But what about
endianness?

Looks like a big can of worms to me.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-09-08 18:03

Message:
Logged In: YES 
user_id=341410

Structures (aka C Structs) can contain arbitrarily large or
small numbers of basic types inside them.  As such, 'single long
values' are still a valid use.  I use struct for packing and
unpacking of single items (8,4,2 byte integers, 1 byte
integers are faster served via chr and ord) when necessary
(because it is the most convenient), as well as a current
contract where it is not uncommon to be packing and
unpacking 256 byte structs.

Those large structs contains various 1,2,4 and 8 byte
integers, as well as a handful of 16 and 20 byte integers
(which I must manually shift and mask during packing and
unpacking).  I'm a big boy, and can do it, but that doesn't
mean that such functionality should be left out of Python.

As for 'document the approach of going through hex inside
the documentation of the struct module', I am curious about
whether other modules do the same thing, that is to tell
users "this functionality conceptually fits here X%, which
is why it is documented here, but because it does not fit
100%, here is how you can do the same thing, which will
likely look like a strange hack, require slicing potentially
large strings, and be significantly slower than if we had
just added the functionality, but here you go anyways."

Now, I don't /need/ the feature, but I believe myself and
others would find it useful.  I also don't /require/ it be
in struct, but no other modules offer equivalent
functionality; Pickle and Marshal are Python-only, binascii
(and bin2hex) are for converting between binary and ascii
representations for transferring over non-8-bit channels
(email, web, etc.), and no other module even comes close to
offering a similar bit of "packs various types into a binary
format, the same way C would" as struct.

If anyone has a better place for it, I'm all ears (or eyes).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-09-08 17:45

Message:
Logged In: YES 
user_id=21627

I would think that

def long_as_bytes(lvalue, width):
    fmt = '%%.%dx' % (2*width)
    return unhexlify(fmt % (lvalue & ((1L<<8*width)-1)))

is short enough for a recipe to not really make a C function
necessary for that feature.

However, if they are going to be provided somewhere, I would
suggest that static methods on the long type might be the
right place.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-09-08 15:30

Message:
Logged In: YES 
user_id=80475

The idea is to expose the _PyLong_FromByteArray() and
_PyLong_AsByteArray() functions.  While long(hexlify(b),16)
is doable for bin2long, going the other way is not so simple.

I agree that these are not struct related.  Originally, I
proposed the binascii module because one of the operations
is so similar to hexlify().

As there other suggestions?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-09-08 15:15

Message:
Logged In: YES 
user_id=21627

Apparently, the point of this request is that the method for
converting long ints to binary should be "easily found in
documentation". And also apparently, the submitter thinks
that the struct module would be the place where people look.

Now, that allows for a simple solution: document the
approach of going through hex inside the documentation of
the struct module.

There is one other reason (beyond being primarily for C
APIs) why such a feature should *not* be in the struct
module: The struct module, most naturally, is about
structures. However, I understand that the intended usage of
this feature would not be structures, but single long
values. Therefore, I consider it counter-intuitive to extend
struct for that.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-09-06 20:02

Message:
Logged In: YES 
user_id=80475

Okay, submit a patch with docs and unittests.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-09-06 19:44

Message:
Logged In: YES 
user_id=341410

As I provide in the feature request, there is already a
method for translating string <-> long.

The problem with current methods for converting between
large integers and strings is that they do not lend
themselves to generally being understandable or to being
documented.

The struct module already provides two appropriate functions
for handling packed binary data, a place for documenting
functions involving packing and unpacking binary data, and
whose implementation seems to be simple enough (one more
format character, much of which borrowed from 's' character,
and a call to _PyLong_FromByteArray seems to be sufficient).

As for the binascii module, many of the functions listed
seem like they should be wrapped into the encode/decode
string methods, hexlify already being so in str.encode('hex').

To me, just being able to translate doesn't seem sufficient
(we already can translate), but being able to do it well,
have it documented well, and placed in a location that is
obvious, fast and optimized for these kinds of things seems
to be the right thing.

>From what I can tell, the only reason why struct doesn't
already have an equivalent format character to the proposed
'g' and 'G', is because the module was created to handle
packed C structs and seemingly "nothing else".  Considering
there doesn't seem to be any other reasonable or easily
documentable location for placing equivalent functionality
(both packing and unpacking), I am of the opinion that
restricting the packing and unpacking to C types in the
struct module (when there are other useful types) is overkill.

As I said, I will provide an implementation if desired.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-09-06 18:34

Message:
Logged In: YES 
user_id=80475

FWIW, I'm working  str/long conversion functions for the
binascii module.  Will that suit your needs?

The tolong function is equivalent to:
    long(hexlify(b), 16)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470


More information about the Python-bugs-list mailing list