Efficiency of using long integers to hold bitmaps

Jeff Melvaine jeffm at rivernet.com.au
Tue Jul 12 19:24:48 CEST 2005


Bengt,

Thanks for your informative reply, further comments interleaved.

"Bengt Richter" <bokr at oz.net> wrote in message 
news:42d165d0.311819733 at news.oz.net...
> On Mon, 11 Jul 2005 02:37:21 +1000, "Jeff Melvaine" 
> <jeffm at rivernet.com.au> wrote:
>
>>I note that I can write expressions like "1 << 100" and the result is 
>>stored
>>as a long integer, which means it is stored as an integer of arbitrary
>>length.  I may need to use a large number of these, and am interested to
>>know whether the storage efficiency of long integers is in danger of
>>breaking my code if I use too many.  Would I do better to write a class 
>>that
>>defines bitwise operations on arrays of integers, each integer being 
>>assumed
>>to contain at most 32 bits?  I cannot find information in the Python 
>>manuals
>>for 2.4.1 that would allow me to resolve this question; perhaps the
>>intention is that programmers should not rely on implementation details.
>>
>>Thanks in advance,
>>
> Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization 
> worry ;-)
>
I'm writing a Sudoku solver of generic order.  The object is not to make it 
half a millisecond faster than the guy next door's solver, but I'd like it 
to be able to do a bit more than just the daily newspaper puzzle, e.g. 
search for uniquely solvable puzzles with minimal numbers of clues. 
NP-completeness will put a lid on things sooner or later, but I'd like to 
get as far as possible before that happens.

Why in Python?  All my recent professional experience is in writing Ada, 
which is not my idea of a rapid prototyping language (or a rapid deliverable 
item handover language either, for that matter :).

Why do I want it to be efficient during debugging, rather than after fine 
tuning?  I take your point, but in a sense the ability to handle large 
problems is part of the proof of concept.

> What is a "large number of these" going to amount to? How many, tops?
> And how many bits in each? How many operations between them? (Since 
> integers
> are immutable, operations mean allocation of space for new ones for 
> results
> and disposing of unused garbage ones (probably back to a special fast pool 
> for
> integers and longs)). Are you interested in a speed/memory tradeoff?
>
The algorithms that interest me most do not involve cyclic data structures, 
so I am trusting in built-in reference counts to avoid memory leaks.  At the 
moment I'm expecting to use bitmaps of constant size (81 for order 3, or 256 
for order 4) for the most numerous data items, so fragmentation should not 
be excessive.

Space was my first thought, but I also expect that the parallelism of 
bitwise operations will be reasonably time-efficient.  I would hope to be 
able to do operations more quickly on a bitmap of n bits than on a list or 
array of n integer variables with values constrained to 0 or 1.  However the 
prospect of writing "a & b" and getting multiword functionality that could 
prove the concepts was rather appealing too; almost executable pseudocode.

The effectiveness of the algorithms will determine how much time and space I 
use, but for NP-complete problems the ceiling is always too low, and one is 
constantly learning new ways to duck.

> If your bit vectors are extremely large and sparse (have only a few bits 
> "on"),
> you might consider sets (import sets and help(sets)) of the bit numbers as 
> representations.
>
> BTW, I wonder if anyone has written an ordered bit set class in C yet. I 
> was tempted ;-)
>
For symbolic Boolean algebra on systems of 729 or 4096 variables, sparse is 
the way to go, but I would want ordered sets too.  I've already implemented 
a Boolean algebra system using Python lists (oops, thank you again Raymond) 
of 32-bit bitmaps, and the calculations it does are not dazzlingly fast.  My 
question about using long integers had a different approach in mind.

> How much memory do you have? Buying more can be a pretty cheap way of 
> solving space worries
> if you are getting paid for your time.
>
512Mb.  The only time I've hit the limit so far was when I got distracted 
enough to leave out the escape condition in a small recursive function. 
Death was surprisingly rapid.

> You should be able to subclass int or long as a way of writing your 
> program in terms of
> your own bit vector class. Then you can change your mind and change the 
> underlying representation
> without changing the code that uses the api. Compared to plain longs it 
> will cost you some speed
> to keep converting results to your own type though.
>
I like to encapsulate, but I hadn't thought of that one.  Yes, it's an 
approach to getting the best of both worlds; instant performance or 
flexibility.  The thought of executable pseudocode that I translate into 
something better if necessary is not too bad for now.

> Bottom line, whether your code will "break" due to storage problems or be 
> fast enough
> will depend on numbers you didn't provide ;-)
>
Re netiquette (with thanks to other posters who made this thread so 
stimulating), I don't demand that respondents to my posted questions should 
do all my thinking for me at a fine level of detail.  But thank you Bengt, 
for taking the trouble to point me in helpful directions.  It is certainly 
worth the OP's trouble to facilitate this for respondents.

Jeff

> Regards,
> Bengt Richter 





More information about the Python-list mailing list