<div dir="auto">Thanks. This is soooo embarressing, but I wasn't able to create a new matrix because I forgot to delete the original massive matrix. I was testing how big it could go in terms of rows/columns before reaching the limit and forgot to delete the last object before creating a new one.<div dir="auto"> Sadly that data usage was not reflected in the task manager for the VM instance.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 24, 2020, 6:44 PM  <<a href="mailto:numpy-discussion-request@python.org">numpy-discussion-request@python.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send NumPy-Discussion mailing list submissions to<br>

        <a href="mailto:numpy-discussion@python.org" target="_blank" rel="noreferrer">numpy-discussion@python.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:numpy-discussion-request@python.org" target="_blank" rel="noreferrer">numpy-discussion-request@python.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:numpy-discussion-owner@python.org" target="_blank" rel="noreferrer">numpy-discussion-owner@python.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of NumPy-Discussion digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. Re: Numpy doesn't use RAM (Sebastian Berg)<br>

   2. Re: Numpy doesn't use RAM (Stanley Seibert)<br>

   3. Re: Numpy doesn't use RAM (Benjamin Root)<br>

   4. Re: Put type annotations in NumPy proper? (Joshua Wilson)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Tue, 24 Mar 2020 13:15:47 -0500<br>

From: Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net" target="_blank" rel="noreferrer">sebastian@sipsolutions.net</a>><br>

To: <a href="mailto:numpy-discussion@python.org" target="_blank" rel="noreferrer">numpy-discussion@python.org</a><br>

Subject: Re: [Numpy-discussion] Numpy doesn't use RAM<br>

Message-ID:<br>

        <<a href="mailto:fb6d9033ce95ce889c1c256c97581e471d6577bf.camel@sipsolutions.net" target="_blank" rel="noreferrer">fb6d9033ce95ce889c1c256c97581e471d6577bf.camel@sipsolutions.net</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:<br>

> Hi Numpy dev community,<br>

> <br>

> I'm keyvis, a statistical data scientist.<br>

> <br>

> I'm currently using numpy in python 3.8.2 64-bit for a clustering<br>

> problem,<br>

> on a machine with 1.9 TB RAM. When I try using np.zeros to create a<br>

> 600,000<br>

> by 600,000 matrix of dtype=np.float32 it says<br>

> "Unable to allocate 1.31 TiB for an array with shape (600000, 600000)<br>

> and<br>

> <br>

> data type float32"<br>

> <br>

<br>

If this error happens, allocating the memory failed. This should be<br>

pretty much a simple `malloc` call in C, so this is the kernel<br>

complaining, not Python/NumPy.<br>

<br>

I am not quite sure, but maybe memory fragmentation plays its part, or<br>

simply are actually out of memory for that process, 1.44TB is a<br>

significant portion of the total memory after all.<br>

<br>

Not sure what to say, but I think you should probably look into other<br>

solutions, maybe using HDF5, zarr, or memory-mapping (although I am not<br>

sure the last actually helps). It will be tricky to work with arrays of<br>

a size that is close to the available total memory.<br>

<br>

Maybe someone who works more with such data here can give you tips on<br>

what projects can help you or what solutions to look into.<br>

<br>

- Sebastian<br>

<br>

<br>

<br>

> I used psutils to determine how much RAM python thinks it has access<br>

> to and<br>

> it return with 1.8 TB approx.<br>

> <br>

> Is there some way I can fix numpy to create these large arrays?<br>

> Thanks for your time and consideration<br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: signature.asc<br>

Type: application/pgp-signature<br>

Size: 833 bytes<br>

Desc: This is a digitally signed message part<br>

URL: <<a href="http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/16501583/attachment-0001.sig" rel="noreferrer noreferrer" target="_blank">http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/16501583/attachment-0001.sig</a>><br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Tue, 24 Mar 2020 13:35:49 -0500<br>

From: Stanley Seibert <<a href="mailto:sseibert@anaconda.com" target="_blank" rel="noreferrer">sseibert@anaconda.com</a>><br>

To: Discussion of Numerical Python <<a href="mailto:numpy-discussion@python.org" target="_blank" rel="noreferrer">numpy-discussion@python.org</a>><br>

Subject: Re: [Numpy-discussion] Numpy doesn't use RAM<br>

Message-ID:<br>

        <<a href="mailto:CADv3RKTjBo48a%2BeYJn7m%2BgpT2iASD8esaiHtZs0vqLNUNY_fbg@mail.gmail.com" target="_blank" rel="noreferrer">CADv3RKTjBo48a+eYJn7m+gpT2iASD8esaiHtZs0vqLNUNY_fbg@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

In addition to what Sebastian said about memory fragmentation and OS limits<br>

about memory allocations, I do think it will be hard to work with an array<br>

that close to the memory limit in NumPy regardless.  Almost any operation<br>

will need to make a temporary array and exceed your memory limit.  You<br>

might want to look at Dask Array for a NumPy-like API for working with<br>

chunked arrays that can be staged in and out of memory:<br>

<br>

<a href="https://docs.dask.org/en/latest/array.html" rel="noreferrer noreferrer" target="_blank">https://docs.dask.org/en/latest/array.html</a><br>

<br>

As a bonus, Dask will also let you make better use of the large number of<br>

CPU cores that you likely have in your 1.9 TB RAM system.  :)<br>

<br>

On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey <<a href="mailto:quantkeyvis@gmail.com" target="_blank" rel="noreferrer">quantkeyvis@gmail.com</a>><br>

wrote:<br>

<br>

> Hi Numpy dev community,<br>

><br>

> I'm keyvis, a statistical data scientist.<br>

><br>

> I'm currently using numpy in python 3.8.2 64-bit for a clustering problem,<br>

> on a machine with 1.9 TB RAM. When I try using np.zeros to create a 600,000<br>

> by 600,000 matrix of dtype=np.float32 it says<br>

> "Unable to allocate 1.31 TiB for an array with shape (600000, 600000) and<br>

> data type float32"<br>

><br>

> I used psutils to determine how much RAM python thinks it has access to<br>

> and it return with 1.8 TB approx.<br>

><br>

> Is there some way I can fix numpy to create these large arrays?<br>

> Thanks for your time and consideration<br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

><br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/02cbeb71/attachment-0001.html" rel="noreferrer noreferrer" target="_blank">http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/02cbeb71/attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Tue, 24 Mar 2020 14:36:45 -0400<br>

From: Benjamin Root <<a href="mailto:ben.v.root@gmail.com" target="_blank" rel="noreferrer">ben.v.root@gmail.com</a>><br>

To: Discussion of Numerical Python <<a href="mailto:numpy-discussion@python.org" target="_blank" rel="noreferrer">numpy-discussion@python.org</a>><br>

Subject: Re: [Numpy-discussion] Numpy doesn't use RAM<br>

Message-ID:<br>

        <CANNq6Fk2vczBWgPPJmbxmSijViwaR=<a href="mailto:CQGVF18Pavi3fcbXXDZA@mail.gmail.com" target="_blank" rel="noreferrer">CQGVF18Pavi3fcbXXDZA@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="utf-8"<br>

<br>

Another thing to point out about having an array of that percentage of the<br>

available memory is that it severely restricts what you can do with it.<br>

Since you are above 50% of the available memory, you won't be able to<br>

create another array that would be the result of computing something with<br>

that array. So, you are restricted to querying (which you could do without<br>

having everything in-memory), or in-place operations.<br>

<br>

Dask arrays might be what you are really looking for.<br>

<br>

Ben Root<br>

<br>

On Tue, Mar 24, 2020 at 2:18 PM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net" target="_blank" rel="noreferrer">sebastian@sipsolutions.net</a>><br>

wrote:<br>

<br>

> On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:<br>

> > Hi Numpy dev community,<br>

> ><br>

> > I'm keyvis, a statistical data scientist.<br>

> ><br>

> > I'm currently using numpy in python 3.8.2 64-bit for a clustering<br>

> > problem,<br>

> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a<br>

> > 600,000<br>

> > by 600,000 matrix of dtype=np.float32 it says<br>

> > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000)<br>

> > and<br>

> ><br>

> > data type float32"<br>

> ><br>

><br>

> If this error happens, allocating the memory failed. This should be<br>

> pretty much a simple `malloc` call in C, so this is the kernel<br>

> complaining, not Python/NumPy.<br>

><br>

> I am not quite sure, but maybe memory fragmentation plays its part, or<br>

> simply are actually out of memory for that process, 1.44TB is a<br>

> significant portion of the total memory after all.<br>

><br>

> Not sure what to say, but I think you should probably look into other<br>

> solutions, maybe using HDF5, zarr, or memory-mapping (although I am not<br>

> sure the last actually helps). It will be tricky to work with arrays of<br>

> a size that is close to the available total memory.<br>

><br>

> Maybe someone who works more with such data here can give you tips on<br>

> what projects can help you or what solutions to look into.<br>

><br>

> - Sebastian<br>

><br>

><br>

><br>

> > I used psutils to determine how much RAM python thinks it has access<br>

> > to and<br>

> > it return with 1.8 TB approx.<br>

> ><br>

> > Is there some way I can fix numpy to create these large arrays?<br>

> > Thanks for your time and consideration<br>

> > _______________________________________________<br>

> > NumPy-Discussion mailing list<br>

> > <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

> > <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

><br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

><br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/12a718d2/attachment-0001.html" rel="noreferrer noreferrer" target="_blank">http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/12a718d2/attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Tue, 24 Mar 2020 15:42:27 -0700<br>

From: Joshua Wilson <<a href="mailto:josh.craig.wilson@gmail.com" target="_blank" rel="noreferrer">josh.craig.wilson@gmail.com</a>><br>

To: Discussion of Numerical Python <<a href="mailto:numpy-discussion@python.org" target="_blank" rel="noreferrer">numpy-discussion@python.org</a>><br>

Subject: Re: [Numpy-discussion] Put type annotations in NumPy proper?<br>

Message-ID:<br>

        <<a href="mailto:CAKFGQGwChWdKGNpv2W5xpE7%2BBxd-hvrgmN6BAUpG%2BHgs-4-2%2Bg@mail.gmail.com" target="_blank" rel="noreferrer">CAKFGQGwChWdKGNpv2W5xpE7+Bxd-hvrgmN6BAUpG+Hgs-4-2+g@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="UTF-8"<br>

<br>

> That is, is this an all-or-nothing thing where as soon as we start, numpy-stubs becomes unusable?<br>

<br>

Until NumPy is made PEP 561 compatible by adding a `py.typed` file,<br>

type checkers will ignore the types in the repo, so in theory you can<br>

avoid the all or nothing. In practice it's maybe trickier because<br>

currently people can use the stubs, but they won't be able to use the<br>

types in the repo until the PEP 561 switch is flipped. So e.g.<br>

currently SciPy pulls the stubs from `numpy-stubs` master, allowing<br>

for a short<br>

<br>

find place where NumPy stubs are lacking -> improve stubs -> improve SciPy types<br>

<br>

loop. If all development moves into the main repo then SciPy is<br>

blocked on it becoming PEP 561 compatible before moving forward. But,<br>

you could complain that I put the cart before the horse with<br>

introducing typing in the SciPy repo before the NumPy types were more<br>

resolved, and that's probably a fair complaint.<br>

<br>

> Anyone interested in taking the lead on this?<br>

<br>

Not that I am a core developer or anything, but I am interested in<br>

helping to improve typing in NumPy.<br>

<br>

On Tue, Mar 24, 2020 at 11:15 AM Eric Wieser<br>

<<a href="mailto:wieser.eric%2Bnumpy@gmail.com" target="_blank" rel="noreferrer">wieser.eric+numpy@gmail.com</a>> wrote:<br>

><br>

> >  Putting<br>

> > aside ndarray, as more challenging, even annotations for numpy functions<br>

> > and method parameters with built-in types would help, as a start.<br>

><br>

> This is a good idea in principle, but one thing concerns me.<br>

><br>

> If we add type annotations to numpy, does it become an error to have numpy-stubs installed?<br>

> That is, is this an all-or-nothing thing where as soon as we start, numpy-stubs becomes unusable?<br>

><br>

> Eric<br>

><br>

> On Tue, 24 Mar 2020 at 17:28, Roman Yurchak <<a href="mailto:rth.yurchak@gmail.com" target="_blank" rel="noreferrer">rth.yurchak@gmail.com</a>> wrote:<br>

>><br>

>> Thanks for re-starting this discussion, Stephan! I think there is<br>

>> definitely significant interest in this topic:<br>

>> <a href="https://github.com/numpy/numpy/issues/7370" rel="noreferrer noreferrer" target="_blank">https://github.com/numpy/numpy/issues/7370</a> is the issue with the largest<br>

>> number of user likes in the issue tracker (FWIW).<br>

>><br>

>> Having them in numpy, as opposed to a separate numpy-stubs repository<br>

>> would indeed be ideal from a user perspective. When looking into it in<br>

>> the past, I was never sure how well in sync numpy-stubs was. Putting<br>

>> aside ndarray, as more challenging, even annotations for numpy functions<br>

>> and method parameters with built-in types would help, as a start.<br>

>><br>

>> To add to the previously listed projects that would benefit from this,<br>

>> we are currently considering to start using some (minimal) type<br>

>> annotations in scikit-learn.<br>

>><br>

>> --<br>

>> Roman Yurchak<br>

>><br>

>> On 24/03/2020 18:00, Stephan Hoyer wrote:<br>

>> > When we started numpy-stubs [1] a few years ago, putting type<br>

>> > annotations in NumPy itself seemed premature. We still supported Python<br>

>> > 2, which meant that we would need to use awkward comments for type<br>

>> > annotations.<br>

>> ><br>

>> > Over the past few years, using type annotations has become increasingly<br>

>> > popular, even in the scientific Python stack. For example, off-hand I<br>

>> > know that at least SciPy, pandas and xarray have at least part of their<br>

>> > APIs type annotated. Even without annotations for shapes or dtypes, it<br>

>> > would be valuable to have near complete annotations for NumPy, the<br>

>> > project at the bottom of the scientific stack.<br>

>> ><br>

>> > Unfortunately, numpy-stubs never really took off. I can think of a few<br>

>> > reasons for that:<br>

>> > 1. Missing high level guidance on how to write type annotations,<br>

>> > particularly for how (or if) to annotate particularly dynamic parts of<br>

>> > NumPy (e.g., consider __array_function__), and whether we should<br>

>> > prioritize strictness or faithfulness [2].<br>

>> > 2. We didn't have a good experience for new contributors. Due to the<br>

>> > relatively low level of interest in the project, when a contributor<br>

>> > would occasionally drop in, I often didn't even notice their PR for a<br>

>> > few weeks.<br>

>> > 3. Developing type annotations separately from the main codebase makes<br>

>> > them a little harder to keep in sync. This means that type annotations<br>

>> > couldn't serve their typical purpose of self-documenting code. Part of<br>

>> > this may be necessary for NumPy (due to our use of C extensions), but<br>

>> > large parts of NumPy's user facing APIs are written in Python. We no<br>

>> > longer support Python 2, so at least we no longer need to worry about<br>

>> > putting annotations in comments.<br>

>> ><br>

>> > We eventually could probably use a formal NEP (or several) on how we<br>

>> > want to use type annotations in NumPy, but I think a good first step<br>

>> > would be to think about how to start moving the annotations from<br>

>> > numpy-stubs into numpy proper.<br>

>> ><br>

>> > Any thoughts? Anyone interested in taking the lead on this?<br>

>> ><br>

>> > Cheers,<br>

>> > Stephan<br>

>> ><br>

>> > [1] <a href="https://github.com/numpy/numpy-stubs" rel="noreferrer noreferrer" target="_blank">https://github.com/numpy/numpy-stubs</a><br>

>> > [2] <a href="https://github.com/numpy/numpy-stubs/issues/12" rel="noreferrer noreferrer" target="_blank">https://github.com/numpy/numpy-stubs/issues/12</a><br>

>> ><br>

>> > _______________________________________________<br>

>> > NumPy-Discussion mailing list<br>

>> > <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

>> > <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

>> ><br>

>><br>

>> _______________________________________________<br>

>> NumPy-Discussion mailing list<br>

>> <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

>> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

><br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

<br>

<br>

------------------------------<br>

<br>

Subject: Digest Footer<br>

<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@python.org" target="_blank" rel="noreferrer">NumPy-Discussion@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

<br>

<br>

------------------------------<br>

<br>

End of NumPy-Discussion Digest, Vol 162, Issue 27<br>

*************************************************<br>

</blockquote></div>