array of random numbers fails to construct

This works. A big array of eight bit random numbers is constructed: import numpy as np spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8) This fails. It eats up all 64GBy of RAM: spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8) The difference is a factor of two, 2**21 rather than 2**20, for the extent of the first axis. -- David P. Saroff Rochester Institute of Technology 54 Lomb Memorial Dr, Rochester, NY 14623 david.saroff@mail.rit.edu | (434) 227-6242

Hi, On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student) <dps7802@rit.edu> wrote:
This works. A big array of eight bit random numbers is constructed:
import numpy as np
spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
This fails. It eats up all 64GBy of RAM:
spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
The difference is a factor of two, 2**21 rather than 2**20, for the extent of the first axis.
I think what's happening is that this: np.random.randint(0,255, (2**21,2**12)) creates 2**33 random integers, which (on 64-bit) will be of dtype int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes = 512 GiB. Cheers, Matthew

On Sun, Dec 6, 2015 at 10:07 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student) <dps7802@rit.edu> wrote:
This works. A big array of eight bit random numbers is constructed:
import numpy as np
spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
This fails. It eats up all 64GBy of RAM:
spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
The difference is a factor of two, 2**21 rather than 2**20, for the extent of the first axis.
I think what's happening is that this:
np.random.randint(0,255, (2**21,2**12))
creates 2**33 random integers, which (on 64-bit) will be of dtype int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes = 512 GiB.
8 is only 2**3, so it is "just" 64 GiB, which also explains why the half sized array does work, but yes, that is most likely what's happening. Jaime
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

Matthew, That looks right. I'm concluding that the .astype(np.uint8) is applied after the array is constructed, instead of during the process. This random array is a test case. In the production analysis of radio telescope data this is how the data comes in, and there is no problem with 10GBy files. linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1) spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum) On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student) <dps7802@rit.edu> wrote:
This works. A big array of eight bit random numbers is constructed:
import numpy as np
spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
This fails. It eats up all 64GBy of RAM:
spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
The difference is a factor of two, 2**21 rather than 2**20, for the extent of the first axis.
I think what's happening is that this:
np.random.randint(0,255, (2**21,2**12))
creates 2**33 random integers, which (on 64-bit) will be of dtype int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes = 512 GiB.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- David P. Saroff Rochester Institute of Technology 54 Lomb Memorial Dr, Rochester, NY 14623 david.saroff@mail.rit.edu | (434) 227-6242

I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used: np.ndarray(100, 'u1', np.random.bytes(100)) It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though. Allan On 12/06/2015 04:55 PM, DAVID SAROFF (RIT Student) wrote:
Matthew,
That looks right. I'm concluding that the .astype(np.uint8) is applied after the array is constructed, instead of during the process. This random array is a test case. In the production analysis of radio telescope data this is how the data comes in, and there is no problem with 10GBy files. linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1) spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum)
On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett@gmail.com <mailto:matthew.brett@gmail.com>> wrote:
Hi,
On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student) <dps7802@rit.edu <mailto:dps7802@rit.edu>> wrote: > This works. A big array of eight bit random numbers is constructed: > > import numpy as np > > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8) > > > > This fails. It eats up all 64GBy of RAM: > > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8) > > > The difference is a factor of two, 2**21 rather than 2**20, for the extent > of the first axis.
I think what's happening is that this:
np.random.randint(0,255, (2**21,2**12))
creates 2**33 random integers, which (on 64-bit) will be of dtype int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes = 512 GiB.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- David P. Saroff Rochester Institute of Technology 54 Lomb Memorial Dr, Rochester, NY 14623 david.saroff@mail.rit.edu <mailto:david.saroff@mail.rit.edu> | (434) 227-6242
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Allan, I see with a google search on your name that you are in the physics department at Rutgers. I got my BA in Physics there. 1975. Biological physics. A thought: Is there an entropy that can be assigned to the dna in an organism? I don't mean the usual thing, coupled to the heat bath. Evolution blindly explores metabolic and signalling pathways, and tends towards disorder, as long as it functions. Someone working out signaling pathways some years ago wrote that they were senselessly complex, branched and interlocked. I think that is to be expected. Evolution doesn't find minimalist, clear, rational solutions. Look at the amazon rain forest. What are all those beetles and butterflies and frogs for? It is the wrong question. I think some measure of the complexity could be related to the amount of time that ecosystem has existed. Similarly for genomes. On Sun, Dec 6, 2015 at 6:55 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:
np.ndarray(100, 'u1', np.random.bytes(100))
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though.
Allan
On 12/06/2015 04:55 PM, DAVID SAROFF (RIT Student) wrote:
Matthew,
That looks right. I'm concluding that the .astype(np.uint8) is applied after the array is constructed, instead of during the process. This random array is a test case. In the production analysis of radio telescope data this is how the data comes in, and there is no problem with 10GBy files. linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1) spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum)
On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett@gmail.com <mailto:matthew.brett@gmail.com>> wrote:
Hi,
On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student) <dps7802@rit.edu <mailto:dps7802@rit.edu>> wrote: > This works. A big array of eight bit random numbers is constructed: > > import numpy as np > > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8) > > > > This fails. It eats up all 64GBy of RAM: > > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8) > > > The difference is a factor of two, 2**21 rather than 2**20, for the extent > of the first axis.
I think what's happening is that this:
np.random.randint(0,255, (2**21,2**12))
creates 2**33 random integers, which (on 64-bit) will be of dtype int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes = 512 GiB.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- David P. Saroff Rochester Institute of Technology 54 Lomb Memorial Dr, Rochester, NY 14623 david.saroff@mail.rit.edu <mailto:david.saroff@mail.rit.edu> | (434) 227-6242
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- David P. Saroff Rochester Institute of Technology 54 Lomb Memorial Dr, Rochester, NY 14623 david.saroff@mail.rit.edu | (434) 227-6242

David,
I'm concluding that the .astype(np.uint8) is applied after the array is constructed, instead of during the process.
That is how python works in general. astype is a method of an array, so randint needs to return the array before there is something with an astype method to call. A dtype keyword arg to randint, on the otherhand, would influence the construction of the array. Elliot

On Sun, Dec 6, 2015 at 6:55 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:
np.ndarray(100, 'u1', np.random.bytes(100))
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword.
+1. Not a high priority, but it would be nice. Warren
It didn't look easy to implement though.
Allan
On 12/06/2015 04:55 PM, DAVID SAROFF (RIT Student) wrote:
Matthew,
That looks right. I'm concluding that the .astype(np.uint8) is applied after the array is constructed, instead of during the process. This random array is a test case. In the production analysis of radio telescope data this is how the data comes in, and there is no problem with 10GBy files. linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1) spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum)
On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett@gmail.com <mailto:matthew.brett@gmail.com>> wrote:
Hi,
On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student) <dps7802@rit.edu <mailto:dps7802@rit.edu>> wrote: > This works. A big array of eight bit random numbers is constructed: > > import numpy as np > > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8) > > > > This fails. It eats up all 64GBy of RAM: > > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8) > > > The difference is a factor of two, 2**21 rather than 2**20, for the extent > of the first axis.
I think what's happening is that this:
np.random.randint(0,255, (2**21,2**12))
creates 2**33 random integers, which (on 64-bit) will be of dtype int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes = 512 GiB.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- David P. Saroff Rochester Institute of Technology 54 Lomb Memorial Dr, Rochester, NY 14623 david.saroff@mail.rit.edu <mailto:david.saroff@mail.rit.edu> | (434) 227-6242
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On 12/08/2015 02:17 AM, Warren Weckesser wrote:
On Sun, Dec 6, 2015 at 6:55 PM, Allan Haldane <allanhaldane@gmail.com <mailto:allanhaldane@gmail.com>> wrote:
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword.
+1. Not a high priority, but it would be nice. Opened an issue for this: https://github.com/numpy/numpy/issues/6790 Warren Sebastian

On Sun, Dec 6, 2015 at 3:55 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:
np.ndarray(100, 'u1', np.random.bytes(100))
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though.
Another workaround that avoids creating a copy is to use the view method, e.g., np.random.randint(np.iinfo(int).min, np.iinfo(int).max, size=(1,)).view(np.uint8) # creates 8 random bytes Cheers, Stephan

Hi, On Tue, Dec 8, 2015 at 4:40 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, Dec 6, 2015 at 3:55 PM, Allan Haldane <allanhaldane@gmail.com> wrote:
I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:
np.ndarray(100, 'u1', np.random.bytes(100))
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though.
Another workaround that avoids creating a copy is to use the view method, e.g., np.random.randint(np.iinfo(int).min, np.iinfo(int).max, size=(1,)).view(np.uint8) # creates 8 random bytes
I think that is not quite (pseudo) random because the second parameter to randint is the max value plus 1 - and: np.random.random_integers(np.iinfo(int).min, np.iinfo(int).max + 1, size=(1,)).view(np.uint8) gives: OverflowError: Python int too large to convert to C long Cheers, Matthew

On 12/08/2015 07:40 PM, Stephan Hoyer wrote:
On Sun, Dec 6, 2015 at 3:55 PM, Allan Haldane <allanhaldane@gmail.com <mailto:allanhaldane@gmail.com>> wrote:
I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:
np.ndarray(100, 'u1', np.random.bytes(100))
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though.
Another workaround that avoids creating a copy is to use the view method, e.g., np.random.randint(np.iinfo(int).min, np.iinfo(int).max, size=(1,)).view(np.uint8) # creates 8 random bytes
Just to note, the line I pasted doesn't copy either, according to the OWNDATA flag. Cheers, Allan

On 12/08/2015 08:01 PM, Allan Haldane wrote:
On 12/08/2015 07:40 PM, Stephan Hoyer wrote:
On Sun, Dec 6, 2015 at 3:55 PM, Allan Haldane <allanhaldane@gmail.com <mailto:allanhaldane@gmail.com>> wrote:
I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:
np.ndarray(100, 'u1', np.random.bytes(100))
It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though.
Another workaround that avoids creating a copy is to use the view method, e.g., np.random.randint(np.iinfo(int).min, np.iinfo(int).max, size=(1,)).view(np.uint8) # creates 8 random bytes
Just to note, the line I pasted doesn't copy either, according to the OWNDATA flag.
Cheers, Allan
Oops, but I forgot my version is readonly. If you want to write to it you do need to make a copy, that's true. Allan
participants (8)
-
Allan Haldane
-
DAVID SAROFF (RIT Student)
-
Elliot Hallmark
-
Jaime Fernández del Río
-
Matthew Brett
-
Sebastian
-
Stephan Hoyer
-
Warren Weckesser