[Patches] [ python-Patches-923643 ] long < ->
byte-string conversion
SourceForge.net
noreply at sourceforge.net
Thu Sep 16 08:56:10 CEST 2004
Patches item #923643, was opened at 2004-03-25 19:17
Message generated for change (Comment added) made by trevp
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=923643&group_id=5470
Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Trevor Perrin (trevp)
Assigned to: Nobody/Anonymous (nobody)
Summary: long <-> byte-string conversion
Initial Comment:
Sometimes you want to turn a long into a byte-string,
or vice-versa. This is useful in cryptographic protocols,
and probably other network protocols where you need
to exchange large integers.
In 2.4, you can handle unsigned longs with this:
def stringToLong(s):
return long(binascii.hexlify(s), 16)
def longToString(n):
return binascii.unhexlify("%x" % n)
However, these functions are slower than they need to
be, they're kinda kludgey, and they don't handle
negative values.
So here's a proposal:
def stringToLong(s):
return long(s, 256)
def longToString(n):
return n.tostring()
These functions operate on big-endian, 2's-complement
byte-strings. If the value is positive but has its most-
significant-bit set, an extra zero-byte will be
prepended. This is the same way OpenSSL and (I think)
GMP handle signed numbers.
These functions are ~5x faster than the earlier ones,
they're cleaner, and they work with negative numbers.
If you only want to deal with unsigned positive numbers,
you'll have to do some adjustments:
def stringToLong(s):
return long('\0'+s, 256)
def longToString(n):
s = n.tostring()
if s[0] == '\0' and s != '\0':
s = s[1:]
return s
That's not ideal, but it seems better than any interface
change I could think of.
Anyways, the patch adds this to longs. It should be
added to ints too, and I guess it needs tests etc.. I
can help with that, if the basic idea is acceptable.
Trevor
----------------------------------------------------------------------
>Comment By: Trevor Perrin (trevp)
Date: 2004-09-15 23:56
Message:
Logged In: YES
user_id=973611
Thanks for the pointer. I'm actually not smitten with any
of the approaches for long<->byte-string conversion (mine
included). Now that I think about it, the best interface
might be if Python had a 'bytes' type, then both conversions
could be done by constructors:
long(bytes)
bytes(long)
If Python is going to grow such a type in the future, maybe
we can defer this question, and live with the hexlifying and
whatnot, till then.
----------------------------------------------------------------------
Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-09-15 09:49
Message:
Logged In: YES
user_id=341410
As an aside, I've requested similar functionality be added
to the struct module, which handles signed, unsigned,
big-endian, little-endian, and integers of arbitrary length.
The request is available here:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470
Would you prefer two functions that live as methods of ints
and longs, or would you prefer the functionality be placed
inside of the struct module (which already does numeric
packing and upacking), and had its own type code?
----------------------------------------------------------------------
Comment By: Trevor Perrin (trevp)
Date: 2004-09-15 02:27
Message:
Logged In: YES
user_id=973611
Uploading a new patch (base256.diff). This implements only
the string-> long (or int) conversion. It adds support for
radix 256 (unsigned) or -256 (2's-complement signed) to the
int() and long() built-ins:
int("\xFF\xFF\xFF", 256) -> 0xFFFFFF
int("\xFF\xFF\xFF", -256) -> -1
long(os.urandom(128), 256) -> 1024-bit integer
I left out the long -> string conversion. If python adds a
bytes() type, then that conversion could be done as
bytes(long). This patch has docs and tests.
----------------------------------------------------------------------
Comment By: Josiah Carlson (josiahcarlson)
Date: 2004-03-29 23:10
Message:
Logged In: YES
user_id=341410
I'm curious to know if anyone would object to optional
minimum or maximum or both arguments, or even some
additional methods that would result in a potentially
constrained string output from long.tostring()?
If I were to split the functionality into three methods,
they would be as follows...
def atleast(long, atl):
if atl < 0:
raise TypeError("atleast requires a positive integer
for a minimum length")
a = long.tostring()
la = len(a)
return (atl-la)*'\o' + a
def atmost(long, atm):
if atm < 0:
raise TypeError("atleast requires a positive integer
for a minimum length")
a = long.tostring()
la = len(a)
return a[:atm]
def constrained(long, atl, atm):
if atm < atl:
raise TypeError("constrained requires that the
maximum length be larger than the minimum length")
if atl < 0 or atm < 0:
raise TypeError("constrained requires that both
arguments are positive")
a = long.tostring()
la = len(a)
return ((atl-la)*'\o' + a)[:atm]
I personally would find use for the above, would anyone else
have use for it?
----------------------------------------------------------------------
Comment By: Trevor Perrin (trevp)
Date: 2004-03-28 16:55
Message:
Logged In: YES
user_id=973611
My last comment was wrong: GMP's raw input/output format
uses big-endian positive values, with the sign bit stored
separately.
----------------------------------------------------------------------
Comment By: Trevor Perrin (trevp)
Date: 2004-03-28 16:54
Message:
Logged In: YES
user_id=973611
My last comment was wrong: GMP's raw input/output format
uses big-endian positive values, with the sign bit stored
separately.
----------------------------------------------------------------------
Comment By: Trevor Perrin (trevp)
Date: 2004-03-26 23:51
Message:
Logged In: YES
user_id=973611
I think 2's complement makes good sense for arbitrary
precision longs. This is how OpenSSL and GMP handle them.
It's also how the ASN.1 BER/DER encodings handle integers:
these encodings just prepend tag and length fields to the big-
endian 2's complement value. I.e.: If you want to extract
RSA public values from an X.509 certificate, they'll be in 2's
complement (well, they'll always be positive... but they'll
have an extra zero byte if necessary).
Since the functionality for 2's complement is already in the C
code it's easy to expose through a patch. So I'm still in favor
of presenting it.
----------------------------------------------------------------------
Comment By: paul rubin (phr)
Date: 2004-03-26 22:57
Message:
Logged In: YES
user_id=72053
How about just punting signed conversion. I don't think
two's complement makes much sense for arbitrary precision
longs. Have some separate representation for negative longs
if needed. If you call hex() on a large negative number,
you get a hex string with a leading minus sign. For base
256, you can't reserve a char like that, so I guess you have
to just throw an error if someone tries to convert a
negative long to a string. If you want a representation for
signed longs, ASN1 DER is probably an ok choice. I agree
with Guido that the binascii module is a good place to put
such a function. Twos complement can work if you specify a
fixed precision, but that sure complicates what this started
out as.
----------------------------------------------------------------------
Comment By: Trevor Perrin (trevp)
Date: 2004-03-26 22:45
Message:
Logged In: YES
user_id=973611
You're right, we should support unsigned strings somehow.
Adding another argument to the int() and long() constructors
would be messy, though. How about:
n = long(s, 256) #unsigned
n = long(s, -256) #signed
n.tounsignedstring()
n.tosignedstring()
The "-256" thing is a hack, I admit.. but it kinda grows on
you, if you stare it at awhile :-)...
----------------------------------------------------------------------
Comment By: Trevor Perrin (trevp)
Date: 2004-03-26 22:45
Message:
Logged In: YES
user_id=973611
You're right, we should support unsigned strings somehow.
Adding another argument to the int() and long() constructors
would be messy, though. How about:
n = long(s, 256) #unsigned
n = long(s, -256) #signed
n.tounsignedstring()
n.tosignedstring()
The "-256" thing is a hack, I admit.. but it kinda grows on
you, if you stare it at awhile :-)...
----------------------------------------------------------------------
Comment By: paul rubin (phr)
Date: 2004-03-25 19:53
Message:
Logged In: YES
user_id=72053
I think those funcs should take an optional extra arg to say
you want unsigned. That's cleaner than prepending '0'. In
cryptography you usually do want unsigned.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=923643&group_id=5470
More information about the Patches
mailing list