<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">An alternative solution may be <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html" class="">https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html</a></div><div class=""><br class=""></div>If you are sure your subsequent computation against the array data has enough locality to avoid thrashing, I think numpy.memmap would work for you, i.e. to use an explicit disk file serving as swap.<div class=""><br class=""></div><div class="">My env does a lot mmap'ing on disk data files by C++ (after Python read meta data), then wrap as ndarray, that's enough to run out-of-core programs as long as data access patterns fit in physical RAM at any instant, then even scanning the whole dataset is okay along the time axis (in realworld not data).</div><div class=""><br class=""></div><div class="">Memory (address space) fragmentation is a problem, besides OS' `nofile` (number of file handles held open) limitation, if too many small data files involved, we are in switching to a solution with FUSE based fs with virtual large file viewing many small files on remote storage server.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Compl<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 2020-03-25, at 02:35, Stanley Seibert <<a href="mailto:sseibert@anaconda.com" class="">sseibert@anaconda.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">In addition to what Sebastian said about memory fragmentation and OS limits about memory allocations, I do think it will be hard to work with an array that close to the memory limit in NumPy regardless. Almost any operation will need to make a temporary array and exceed your memory limit. You might want to look at Dask Array for a NumPy-like API for working with chunked arrays that can be staged in and out of memory:<div class=""><br class=""></div><div class=""><a href="https://docs.dask.org/en/latest/array.html" class="">https://docs.dask.org/en/latest/array.html</a><br class=""></div><div class=""><br class=""></div><div class="">As a bonus, Dask will also let you make better use of the large number of CPU cores that you likely have in your 1.9 TB RAM system. :)</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey <<a href="mailto:quantkeyvis@gmail.com" class="">quantkeyvis@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto" class="">Hi Numpy dev community,<div dir="auto" class=""><br class=""></div><div dir="auto" class=""><div style="font-family:sans-serif;font-size:12.8px" dir="auto" class=""><div style="width:328px;margin:16px 0px" class=""><div class=""><div dir="ltr" class=""><div class="">I'm keyvis, a statistical data scientist.</div><div class=""><br class=""></div><div class="">I'm currently using numpy in python 3.8.2 64-bit for a clustering problem, on a machine with 1.9 TB RAM. When I try using np.zeros to create a 600,000 by 600,000 matrix of dtype=np.float32 it says<br class=""></div><div class="">"Unable to allocate 1.31 TiB for an array with shape (600000, 600000) and data type float32"</div><div class=""><br class=""></div><div class="">I used psutils to determine how much RAM python thinks it has access to and it return with 1.8 TB approx.<br class=""></div><div class=""><br class=""></div><div class="">Is there some way I can fix numpy to create these large arrays?</div></div></div></div><div style="height:0px" class=""></div></div>Thanks for your time and consideration</div></div>
_______________________________________________<br class="">
NumPy-Discussion mailing list<br class="">
<a href="mailto:NumPy-Discussion@python.org" target="_blank" class="">NumPy-Discussion@python.org</a><br class="">
<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank" class="">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br class="">
</blockquote></div>
_______________________________________________<br class="">NumPy-Discussion mailing list<br class=""><a href="mailto:NumPy-Discussion@python.org" class="">NumPy-Discussion@python.org</a><br class="">https://mail.python.org/mailman/listinfo/numpy-discussion<br class=""></div></blockquote></div><br class=""></div></body></html>