[Numpy-discussion] NumPy-Discussion Digest, Vol 162, Issue 27

Keyvis Damptey quantkeyvis at gmail.com
Tue Mar 24 18:50:02 EDT 2020


Thanks. This is soooo embarressing, but I wasn't able to create a new
matrix because I forgot to delete the original massive matrix. I was
testing how big it could go in terms of rows/columns before reaching the
limit and forgot to delete the last object before creating a new one.
 Sadly that data usage was not reflected in the task manager for the VM
instance.

On Tue, Mar 24, 2020, 6:44 PM <numpy-discussion-request at python.org> wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at python.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>    1. Re: Numpy doesn't use RAM (Sebastian Berg)
>    2. Re: Numpy doesn't use RAM (Stanley Seibert)
>    3. Re: Numpy doesn't use RAM (Benjamin Root)
>    4. Re: Put type annotations in NumPy proper? (Joshua Wilson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 24 Mar 2020 13:15:47 -0500
> From: Sebastian Berg <sebastian at sipsolutions.net>
> To: numpy-discussion at python.org
> Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
> Message-ID:
>         <fb6d9033ce95ce889c1c256c97581e471d6577bf.camel at sipsolutions.net>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> > Hi Numpy dev community,
> >
> > I'm keyvis, a statistical data scientist.
> >
> > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> > problem,
> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> > 600,000
> > by 600,000 matrix of dtype=np.float32 it says
> > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000)
> > and
> >
> > data type float32"
> >
>
> If this error happens, allocating the memory failed. This should be
> pretty much a simple `malloc` call in C, so this is the kernel
> complaining, not Python/NumPy.
>
> I am not quite sure, but maybe memory fragmentation plays its part, or
> simply are actually out of memory for that process, 1.44TB is a
> significant portion of the total memory after all.
>
> Not sure what to say, but I think you should probably look into other
> solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
> sure the last actually helps). It will be tricky to work with arrays of
> a size that is close to the available total memory.
>
> Maybe someone who works more with such data here can give you tips on
> what projects can help you or what solutions to look into.
>
> - Sebastian
>
>
>
> > I used psutils to determine how much RAM python thinks it has access
> > to and
> > it return with 1.8 TB approx.
> >
> > Is there some way I can fix numpy to create these large arrays?
> > Thanks for your time and consideration
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 833 bytes
> Desc: This is a digitally signed message part
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/16501583/attachment-0001.sig
> >
>
> ------------------------------
>
> Message: 2
> Date: Tue, 24 Mar 2020 13:35:49 -0500
> From: Stanley Seibert <sseibert at anaconda.com>
> To: Discussion of Numerical Python <numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
> Message-ID:
>         <
> CADv3RKTjBo48a+eYJn7m+gpT2iASD8esaiHtZs0vqLNUNY_fbg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> In addition to what Sebastian said about memory fragmentation and OS limits
> about memory allocations, I do think it will be hard to work with an array
> that close to the memory limit in NumPy regardless.  Almost any operation
> will need to make a temporary array and exceed your memory limit.  You
> might want to look at Dask Array for a NumPy-like API for working with
> chunked arrays that can be staged in and out of memory:
>
> https://docs.dask.org/en/latest/array.html
>
> As a bonus, Dask will also let you make better use of the large number of
> CPU cores that you likely have in your 1.9 TB RAM system.  :)
>
> On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey <quantkeyvis at gmail.com>
> wrote:
>
> > Hi Numpy dev community,
> >
> > I'm keyvis, a statistical data scientist.
> >
> > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> problem,
> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> 600,000
> > by 600,000 matrix of dtype=np.float32 it says
> > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000) and
> > data type float32"
> >
> > I used psutils to determine how much RAM python thinks it has access to
> > and it return with 1.8 TB approx.
> >
> > Is there some way I can fix numpy to create these large arrays?
> > Thanks for your time and consideration
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/02cbeb71/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Tue, 24 Mar 2020 14:36:45 -0400
> From: Benjamin Root <ben.v.root at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
> Message-ID:
>         <CANNq6Fk2vczBWgPPJmbxmSijViwaR=
> CQGVF18Pavi3fcbXXDZA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Another thing to point out about having an array of that percentage of the
> available memory is that it severely restricts what you can do with it.
> Since you are above 50% of the available memory, you won't be able to
> create another array that would be the result of computing something with
> that array. So, you are restricted to querying (which you could do without
> having everything in-memory), or in-place operations.
>
> Dask arrays might be what you are really looking for.
>
> Ben Root
>
> On Tue, Mar 24, 2020 at 2:18 PM Sebastian Berg <sebastian at sipsolutions.net
> >
> wrote:
>
> > On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> > > Hi Numpy dev community,
> > >
> > > I'm keyvis, a statistical data scientist.
> > >
> > > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> > > problem,
> > > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> > > 600,000
> > > by 600,000 matrix of dtype=np.float32 it says
> > > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000)
> > > and
> > >
> > > data type float32"
> > >
> >
> > If this error happens, allocating the memory failed. This should be
> > pretty much a simple `malloc` call in C, so this is the kernel
> > complaining, not Python/NumPy.
> >
> > I am not quite sure, but maybe memory fragmentation plays its part, or
> > simply are actually out of memory for that process, 1.44TB is a
> > significant portion of the total memory after all.
> >
> > Not sure what to say, but I think you should probably look into other
> > solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
> > sure the last actually helps). It will be tricky to work with arrays of
> > a size that is close to the available total memory.
> >
> > Maybe someone who works more with such data here can give you tips on
> > what projects can help you or what solutions to look into.
> >
> > - Sebastian
> >
> >
> >
> > > I used psutils to determine how much RAM python thinks it has access
> > > to and
> > > it return with 1.8 TB approx.
> > >
> > > Is there some way I can fix numpy to create these large arrays?
> > > Thanks for your time and consideration
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/12a718d2/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Tue, 24 Mar 2020 15:42:27 -0700
> From: Joshua Wilson <josh.craig.wilson at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] Put type annotations in NumPy proper?
> Message-ID:
>         <
> CAKFGQGwChWdKGNpv2W5xpE7+Bxd-hvrgmN6BAUpG+Hgs-4-2+g at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> > That is, is this an all-or-nothing thing where as soon as we start,
> numpy-stubs becomes unusable?
>
> Until NumPy is made PEP 561 compatible by adding a `py.typed` file,
> type checkers will ignore the types in the repo, so in theory you can
> avoid the all or nothing. In practice it's maybe trickier because
> currently people can use the stubs, but they won't be able to use the
> types in the repo until the PEP 561 switch is flipped. So e.g.
> currently SciPy pulls the stubs from `numpy-stubs` master, allowing
> for a short
>
> find place where NumPy stubs are lacking -> improve stubs -> improve SciPy
> types
>
> loop. If all development moves into the main repo then SciPy is
> blocked on it becoming PEP 561 compatible before moving forward. But,
> you could complain that I put the cart before the horse with
> introducing typing in the SciPy repo before the NumPy types were more
> resolved, and that's probably a fair complaint.
>
> > Anyone interested in taking the lead on this?
>
> Not that I am a core developer or anything, but I am interested in
> helping to improve typing in NumPy.
>
> On Tue, Mar 24, 2020 at 11:15 AM Eric Wieser
> <wieser.eric+numpy at gmail.com> wrote:
> >
> > >  Putting
> > > aside ndarray, as more challenging, even annotations for numpy
> functions
> > > and method parameters with built-in types would help, as a start.
> >
> > This is a good idea in principle, but one thing concerns me.
> >
> > If we add type annotations to numpy, does it become an error to have
> numpy-stubs installed?
> > That is, is this an all-or-nothing thing where as soon as we start,
> numpy-stubs becomes unusable?
> >
> > Eric
> >
> > On Tue, 24 Mar 2020 at 17:28, Roman Yurchak <rth.yurchak at gmail.com>
> wrote:
> >>
> >> Thanks for re-starting this discussion, Stephan! I think there is
> >> definitely significant interest in this topic:
> >> https://github.com/numpy/numpy/issues/7370 is the issue with the
> largest
> >> number of user likes in the issue tracker (FWIW).
> >>
> >> Having them in numpy, as opposed to a separate numpy-stubs repository
> >> would indeed be ideal from a user perspective. When looking into it in
> >> the past, I was never sure how well in sync numpy-stubs was. Putting
> >> aside ndarray, as more challenging, even annotations for numpy functions
> >> and method parameters with built-in types would help, as a start.
> >>
> >> To add to the previously listed projects that would benefit from this,
> >> we are currently considering to start using some (minimal) type
> >> annotations in scikit-learn.
> >>
> >> --
> >> Roman Yurchak
> >>
> >> On 24/03/2020 18:00, Stephan Hoyer wrote:
> >> > When we started numpy-stubs [1] a few years ago, putting type
> >> > annotations in NumPy itself seemed premature. We still supported
> Python
> >> > 2, which meant that we would need to use awkward comments for type
> >> > annotations.
> >> >
> >> > Over the past few years, using type annotations has become
> increasingly
> >> > popular, even in the scientific Python stack. For example, off-hand I
> >> > know that at least SciPy, pandas and xarray have at least part of
> their
> >> > APIs type annotated. Even without annotations for shapes or dtypes, it
> >> > would be valuable to have near complete annotations for NumPy, the
> >> > project at the bottom of the scientific stack.
> >> >
> >> > Unfortunately, numpy-stubs never really took off. I can think of a few
> >> > reasons for that:
> >> > 1. Missing high level guidance on how to write type annotations,
> >> > particularly for how (or if) to annotate particularly dynamic parts of
> >> > NumPy (e.g., consider __array_function__), and whether we should
> >> > prioritize strictness or faithfulness [2].
> >> > 2. We didn't have a good experience for new contributors. Due to the
> >> > relatively low level of interest in the project, when a contributor
> >> > would occasionally drop in, I often didn't even notice their PR for a
> >> > few weeks.
> >> > 3. Developing type annotations separately from the main codebase makes
> >> > them a little harder to keep in sync. This means that type annotations
> >> > couldn't serve their typical purpose of self-documenting code. Part of
> >> > this may be necessary for NumPy (due to our use of C extensions), but
> >> > large parts of NumPy's user facing APIs are written in Python. We no
> >> > longer support Python 2, so at least we no longer need to worry about
> >> > putting annotations in comments.
> >> >
> >> > We eventually could probably use a formal NEP (or several) on how we
> >> > want to use type annotations in NumPy, but I think a good first step
> >> > would be to think about how to start moving the annotations from
> >> > numpy-stubs into numpy proper.
> >> >
> >> > Any thoughts? Anyone interested in taking the lead on this?
> >> >
> >> > Cheers,
> >> > Stephan
> >> >
> >> > [1] https://github.com/numpy/numpy-stubs
> >> > [2] https://github.com/numpy/numpy-stubs/issues/12
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion at python.org
> >> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >> >
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 162, Issue 27
> *************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/af0e51c2/attachment-0001.html>


More information about the NumPy-Discussion mailing list