syntax difference
Bart
bc at freeuk.com
Wed Jun 27 08:42:13 EDT 2018
On 27/06/2018 12:42, Peter J. Holzer wrote:
> On 2018-06-27 11:11:37 +1200, Gregory Ewing wrote:
>> Bart wrote:
>>> x = set(range(10_000_000))
>>>
>>> This used up 460MB of RAM (the original 100M I tried exhausted the memory).
>>>
>>> The advantage of Pascal-style sets is that that same set will occupy
>>> only 1.25MB, as it is a bit-map.
>>
>> That's true, but they're also extremely limited compared to
>> the things you can do with Python sets. They're really two
>> quite different things, designed for different use cases.
>>
>> Compact sets of integers are possible in Python, but not
>> as a built-in type. This suggests that they're not needed
>> much
>
> Also, when such sets are needed, a simple array of bits may not be
> sufficient. For example, sets may be sparse, or they may have almost all
> except a few bits set. In these cases a tree or RLE representation is
> much smaller and faster. There are a number of such implementations
> (Judy Arrays and Roaring Bitmaps come to mind[1]). Each has its
> advantages and limitations, so its probably better to leave the choice
> to the author.
From roaringbitmap.org:
"Bitsets, also called bitmaps, are commonly used as fast data
structures. Unfortunately, they can use too much memory. To compensate,
we often use compressed bitmaps."
So from one extreme to another: from the 300-400 bits per element used
by Python's set type, to some presumably tiny fraction of a bit [on
average] used in this scheme, as 1 bit per element is too much.
Is there no place at all for an ordinary, straightforward, uncompressed
bit-set where each element takes exactly one bit? It does seem as though
Python users have an aversion to simplicity and efficiency.
FWIW when I was working with actual image bitmaps, they were nearly
always uncompressed when in memory. 1-bit-per-pixel images, for mono
images or masks, would pack 8 pixels per byte. Compression would only be
used for storage or transmission.
And the same with all the set types I've used.
However there is a slight difference with the concept of a 'set' as I
used it, and as it was used in Pascal, compared with Python's set: there
was the notion of the overall size of the set: the total number of
elements, whether each was '1' or '0'.
So a set type that represented all ASCII codes would have a size of 128
elements, indexed 0 to 127. So such a set initialised to ['A'..'Z'] and
then inverted would give the set [0..64,91..127].
--
bart
More information about the Python-list
mailing list