[Tutor] Why do sets use 6 times as much memory as lists?
Steven D'Aprano
steve at pearwood.info
Wed May 8 03:21:39 CEST 2013
On 08/05/13 03:10, Bjorn Madsen wrote:
>>>> import sys
>>>> L=[x for x in range(10000)]
>>>> sys.getsizeof(L)
> 43816
>>>> L={x for x in range(10000)}
>>>> sys.getsizeof(L)
> 262260
> ?
Both sets and lists may be over-allocated. I expect that lists created with a list comprehension may not be over-allocated, but if you append a single item to it, Python *may* resize it and quadruple or double the size. If you don't over-allocate, then *every* append becomes slow:
[a, b, c, d] append e:
copy the list, plus one extra slot:
[a, b, c, d]
[a, b, c, d, UNUSED]
insert e:
[a, b, c, d]
[a, b, c, d, e]
delete the original:
[a, b, c, d, e]
Instead of adding just one extra slot, Python quadruples (for small lists) or doubles the size, so that *nearly always* your list has plenty of extra slots to append into instantly. This makes appending an almost-constant time operation, on average, regardless of how big your list is, and big list will never use more than twice the amount of memory needed. That's a good trade off of space versus time (memory versus speed).
Sets, and dicts, are also over-allocated, because of the nature of how hash tables work. They also trade off space for time. The hash table is a big array of slots, some of which are flagged as "empty", some of which have an item in them:
[EMPTY, EMPTY, b, EMPTY, d, c, EMPTY, EMPTY, EMPTY, a, EMPTY, e]
When you do a lookup on (say) c, Python calculates a hash of c to get the index to look. Say it gets index 5, it looks at index 5 and sees c is there. This means it only needs to look in one slot instead of up to 12.
The more empty slots, the better the hash table will perform. If you would like more details on how hash tables work, you can google it, or just ask here, but the short answer is that in order to get extremely fast almost constant-time insertion, deletion and lookup of items in dicts and sets, they trade off some memory for speed.
--
Steven
More information about the Tutor
mailing list