Comparing sequences with range objects
duncan smith
duncan at invalid.invalid
Fri Apr 8 20:01:50 EDT 2022
On 08/04/2022 22:08, Antoon Pardon wrote:
>
> Op 8/04/2022 om 16:28 schreef duncan smith:
>> On 08/04/2022 08:21, Antoon Pardon wrote:
>>>
>>> Yes I know all that. That is why I keep a bucket of possible duplicates
>>> per "identifying" field that is examined and use some heuristics at the
>>> end of all the comparing instead of starting to weed out the duplicates
>>> at the moment something differs.
>>>
>>> The problem is, that when an identifying field is judged to be unusable,
>>> the bucket to be associated with it should conceptually contain all
>>> other
>>> records (which in this case are the indexes into the population list).
>>> But that will eat a lot of memory. So I want some object that behaves as
>>> if it is a (immutable) list of all these indexes without actually
>>> containing
>>> them. A range object almost works, with the only problem it is not
>>> comparable with a list.
>>>
>>
>> Is there any reason why you can't use ints? Just set the relevant bits.
>
> Well my first thought is that a bitset makes it less obvious to calulate
> the size of the set or to iterate over its elements. But it is an idea
> worth exploring.
>
def popcount(n):
"""
Returns the number of set bits in n
"""
cnt = 0
while n:
n &= n - 1
cnt += 1
return cnt
and not tested,
def iterinds(n):
"""
Returns a generator of the indices of the set bits of n
"""
i = 0
while n:
if n & 1:
yield i
n = n >> 1
i += 1
Duncan
More information about the Python-list
mailing list