[Numpy-discussion] array of random numbers fails to construct

Warren Weckesser warren.weckesser at gmail.com
Mon Dec 7 20:17:26 EST 2015


On Sun, Dec 6, 2015 at 6:55 PM, Allan Haldane <allanhaldane at gmail.com>
wrote:

>
> I've also often wanted to generate large datasets of random uint8 and
> uint16. As a workaround, this is something I have used:
>
> np.ndarray(100, 'u1', np.random.bytes(100))
>
> It has also crossed my mind that np.random.randint and np.random.rand
> could use an extra 'dtype' keyword.



+1.  Not a high priority, but it would be nice.

Warren



> It didn't look easy to implement though.
>
> Allan
>
> On 12/06/2015 04:55 PM, DAVID SAROFF (RIT Student) wrote:
>
>> Matthew,
>>
>> That looks right. I'm concluding that the .astype(np.uint8) is applied
>> after the array is constructed, instead of during the process. This
>> random array is a test case. In the production analysis of radio
>> telescope data this is how the data comes in, and there is no  problem
>> with 10GBy files.
>> linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1)
>> spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum)
>>
>>
>> On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett at gmail.com
>> <mailto:matthew.brett at gmail.com>> wrote:
>>
>>     Hi,
>>
>>     On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student)
>>     <dps7802 at rit.edu <mailto:dps7802 at rit.edu>> wrote:
>>     > This works. A big array of eight bit random numbers is constructed:
>>     >
>>     > import numpy as np
>>     >
>>     > spectrumArray = np.random.randint(0,255,
>> (2**20,2**12)).astype(np.uint8)
>>     >
>>     >
>>     >
>>     > This fails. It eats up all 64GBy of RAM:
>>     >
>>     > spectrumArray = np.random.randint(0,255,
>> (2**21,2**12)).astype(np.uint8)
>>     >
>>     >
>>     > The difference is a factor of two, 2**21 rather than 2**20, for the
>> extent
>>     > of the first axis.
>>
>>     I think what's happening is that this:
>>
>>     np.random.randint(0,255, (2**21,2**12))
>>
>>     creates 2**33 random integers, which (on 64-bit) will be of dtype
>>     int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes
>>     = 512 GiB.
>>
>>     Cheers,
>>
>>     Matthew
>>     _______________________________________________
>>     NumPy-Discussion mailing list
>>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>>     https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>>
>> --
>> David P. Saroff
>> Rochester Institute of Technology
>> 54 Lomb Memorial Dr, Rochester, NY 14623
>> david.saroff at mail.rit.edu <mailto:david.saroff at mail.rit.edu> | (434)
>> 227-6242
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151207/950c57db/attachment.html>


More information about the NumPy-Discussion mailing list