Pickle caching objects?
José María Mateos
chema at rinzewind.org
Sat Nov 30 17:05:45 EST 2019
Hi,
I just asked this question on the IRC channel but didn't manage to get a
response, though some people replied with suggestions that expanded this
question a bit.
I have a program that has to read some pickle files, perform some
operations on them, and then return. The pickle objects I am reading all
have the same structure, which consists of a single list with two
elements: the first one is a long list, the second one is a numpy
object.
I found out that, after calling that function, the memory taken by the
Python executable (monitored using htop -- the entire thing runs on
Python 3.6 on an Ubuntu 16.04, pretty standard conda installation with a
few packages installed directly using `conda install`) increases in
proportion to the size of the pickle object being read. My intuition is
that that memory should be free upon exiting.
Does pickle keep a cache of objects in memory after they have been
returned? I thought that could be the answer, but then someone suggested
to measure the time it takes to load the objects. This is a script I
wrote to test this; nothing(filepath) just loads the pickle file,
doesn't do anything with the output and returns how long it took to
perform the load operation.
---
import glob
import pickle
import timeit
import os
import psutil
def nothing(filepath):
start = timeit.default_timer()
with open(filepath, 'rb') as f:
_ = pickle.load(f)
return timeit.default_timer() - start
if __name__ == "__main__":
filelist = glob.glob('/tmp/test/*.pk')
for i, filepath in enumerate(filelist):
print("Size of file {}: {}".format(i, os.path.getsize(filepath)))
print("First call:", nothing(filepath))
print("Second call:", nothing(filepath))
print("Memory usage:", psutil.Process(os.getpid()).memory_info().rss)
print()
---
This is the output of the second time the script was run, to avoid any
effects of potential IO caches:
---
Size of file 0: 11280531
First call: 0.1466723980847746
Second call: 0.10044755204580724
Memory usage: 49418240
Size of file 1: 8955825
First call: 0.07904054620303214
Second call: 0.07996074995025992
Memory usage: 49831936
Size of file 2: 43727266
First call: 0.37741047400049865
Second call: 0.38176894187927246
Memory usage: 49758208
Size of file 3: 31122090
First call: 0.271301960805431
Second call: 0.27462846506386995
Memory usage: 49991680
Size of file 4: 634456686
First call: 5.526095286011696
Second call: 5.558765463065356
Memory usage: 539324416
Size of file 5: 3349952658
First call: 29.50982437795028
Second call: 29.461691531119868
Memory usage: 3443597312
Size of file 6: 9384929
First call: 0.0826977719552815
Second call: 0.08362263604067266
Memory usage: 3443597312
Size of file 7: 422137
First call: 0.0057482069823890924
Second call: 0.005949910031631589
Memory usage: 3443597312
Size of file 8: 409458799
First call: 3.562588643981144
Second call: 3.6001368327997625
Memory usage: 3441451008
Size of file 9: 44843816
First call: 0.39132978999987245
Second call: 0.398518088972196
Memory usage: 3441451008
---
Notice that memory usage increases noticeably specially on files 4 and
5, the biggest ones, and doesn't come down as I would expect it to. But
the loading time is constant, so I think I can disregard any pickle
caching mechanisms.
So I guess now my question is: can anyone give me any pointers as to why
is this happening? Any help is appreciated.
Thanks,
--
José María (Chema) Mateos || https://rinzewind.org/
More information about the Python-list
mailing list