ANN: mxNumber -- Experimental Number Types, Version 0.2.0

Brian Kelley kelley at bioreason.com
Mon Apr 30 11:07:20 EDT 2001


> That's fine for C, but makes no sense in a Python interface; i.e., wtf is
> MAX_ULONG in Python terms?  Python doesn't even have an unsigned integral
> type.
>
> So that's where the silly arguments start.  Just pick *something*.  For
> example, sys.maxint is closest in spirit to MAX_ULONG, but shares the defect
> of the GMP definition that it's ambiguous whether it means "infinity" or "a
> whole lot but nevertheless finite" in this context.  -1 would make more sense
> for Python, and is not ambiguous; GMP doesn't have that choice, though, since
> it returns an unsigned result.
>

Hmmm.  Looks like I missed most of the previous discussions, I'll have to hunt
dejanews.


> > more good stuff at
> > http://www.swox.com/gmp/manual/gmp_6.html#SEC30
>
> Right, they have lots of good stuff.  The functions aren't all well-defined
> in Python terms, though, and sometimes not even in C terms; e.g.,
>
>     Function: unsigned long int
>               mpz_scan1 (mpz_t op, unsigned long int starting_bit)
>     Scan op, starting with bit starting_bit, towards more significant
>     bits, until the first set bit is found.  Return the index of the
>     found bit.
>
> The docs there really don't define what "starting_bit" or "index" mean
> (perhaps 0-based, with index i being bit 2**i?  i.e., starting with 0 "from
> the right"?).  Then what do you think mpz_scan1(0, 0) returns?  That is,
> there are no 1 bits in 0 for scan1 to find.  I can guess that they return
> MAX_ULONG again in such cases, but they don't say so, and as above -1 is
> probably a better result for Python to return.
>

I was confused by this as well, I had to expose the function and play with it to
figure out what they meant.

>
> > This is more what I meant:
> >
> > >>i = mx.Number.Integer("100101011101010")
> > >>pickle.dump(i,0)
> > "cmx.Number\n_I\np0\n(S'10101010101010'\np1\ntp2\nRp3\n."
> >
> > The string S'10101010101010' is a fairly wasteful encoding for a
> > bit vector.
>
> Sure.  Is it actually a problem for you in practice, or is just something
> that offends because it's provably less than optimal?  Note that text-mode
> pickles are *meant* to be easily human-readable too, and there's no clearer
> way to "encode" the decimal integer 100101011101010 than as the string
> "10101010101010"

It is a problem in practice.  I am writing a caching system for bit vectors and
response time is important.   I have no problem with text mode pickles, it just
seems slightly odd that the binary mode uses (essentially) the same encoding
while marshal seems to have a much more efficient binary encoding.

> -- Python does the same for its own native long (unbounded
> int) pickles.  A mild compromise would be to use a hex string instead (still
> easily readable, encodes 4 bits per byte instead of ~3.3, and should be very
> much faster for pickle<->internal conversions of very long ints).

I was thinking along these same lines.

Anyway, it seems like I can avoid the whole problem by renaming mx.Number.Integer
as "BitVector" This is what I am using this structure for anyway.  Then I can
avoid all of these problems.  So let me ask this question, would anyone mind a
contributed type BitVector to mx.Number?  Then I can add all of the fun stuff
like Tanimoto, Euclidean and Jaccard distances...

Thanks for listening.

Brian Kelley




More information about the Python-list mailing list