[SPAM-Bayes] - Re: Converting IBM Floats..Help.. - Bayesian Filter detected spam
jepler at unpythonic.net
Thu Mar 25 20:51:57 CET 2004
> Bayesian Filter detected spam
huh -- sorry your filter thought I was spam, I'm glad you found my
response. I've had very good luck with spambayes, personally.
Here's one page I found with an explanation of how more standard
floating-point numbers are stored:
The most important difference between the IEEE formats and the IBM
format is that the radix is 16 instead of 2. I think there's also a
terminlogy error--the page says that the significand (mantissa) is
stored in "two's compliment binary", but this isn't true. It's
unsigned. The sign is stored separately.
On Thu, Mar 25, 2004 at 02:17:16PM -0500, Ian Sparks wrote:
> Thanks so much for this, it works fine on the rest of my data too. I'm going to have to create code that will do the reverse conversion too so I want to make sure I understand what happened here so I can apply the same concepts. I have some instructions on the reverse-process and I believe its mostly bit-shifting.
> This is somewhat "teach a man a fish" so I appreciate you're bearing with me on this basic CS stuff. I think I'm nearly there...
> def ibm360_decode(s):
> #Get element 0 of the unpack tuple
> #struct format : > = Big Endian, Q means unsigned long long
> #not sure why we need an unsigned long long ?
> #think its because we need that to do bitwise operations?
I used Q, because the data is 64 bits, and I wanted to get it in a
single Python variable. I could have used 'l, m = struct.unpack(">LL", s)'
to convert the string into a pair integers, but that would have been
less convenient because the mantissa would be split between l and m.
> exponent_with_sign = (l >> 56)
> #But we still have our sign-bit at the left side so we've got 8 digits
> #we need to cut off that extra digit
> #we can do this using the bitwise & to compare our number against 00111111
> # (0x7f - 64 == 127 - 64 == 63 = 111111)
> # e.g. for 7 with a positive sign we'd have
> # 7 = 10000111 (looks like 135)
> # 63 = 00111111 &
> # --------
> # 0000111 = 7
> # But I'm clearly missing something here because 63 is only 6 binary-digits long and we need
> # to cut off the 8th...127 would seem to be more appropriate (but doesn't work). Umph ?
> exponent = exponent_with_sign & 0x7f - 64
(exponent_with_sign & 0x7f) extracts the 7 bits of the exponent.
However, the exponent is biased. Instead of being stored as a signed
number, the unsigned (exponent + bias) is stored. The subtraction
removes the bias.
exponent_with_sign_and_bias = (l>>56)
exponent_with_bias = exponent_with_sign_and_bias & 0x7f # 7 bits
exponent = exponent_with_bias - 64
> #Ok, moving on.
> #The mantissa is the last 56 bits (reading from left->right) so we need to cut off the first
> #8 bits which means ANDing (&) with a large number representing all 1's for 56 of those bits
> #and zero for the others e.g. 00000000111111...1111
> #The / by 16. ** 14 has me scratching my head, I admit. Our mantissa is firmly planted in the
> #right hand side of our binary number isn't it? But we're dividing by a massive number....?
> mantissa = (l & ((1L<<56) - 1)) / (16. ** 14)
The mantissa is an unsigned binary number. If you treat it as a float,
the radix point (decimal point) is actually to the left of the first bit
of mantissa. The division by a large power of 16 accomplishes this.
> #The instructions said the true exponent is 16 * the exponent value we extracted, again, not
> #sure why we're multiplying the mantissa up too?
> return [1,-1][sign] * (16**exponent) * mantissa
The exponent shifts the radix point again, to its final position.
When the exponent changes by 1, the radix point moves by one hex digit,
or 4 bits.
More information about the Python-list