On Sun, Dec 6, 2015 at 6:55 PM, Allan Haldane <allanhaldane@gmail.com> wrote:

I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:

np.ndarray(100, 'u1', np.random.bytes(100))

It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword.


+1.  Not a high priority, but it would be nice.

Warren

 
It didn't look easy to implement though.

Allan

On 12/06/2015 04:55 PM, DAVID SAROFF (RIT Student) wrote:
Matthew,

That looks right. I'm concluding that the .astype(np.uint8) is applied
after the array is constructed, instead of during the process. This
random array is a test case. In the production analysis of radio
telescope data this is how the data comes in, and there is no  problem
with 10GBy files.
linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1)
spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum)


On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett@gmail.com
<mailto:matthew.brett@gmail.com>> wrote:

    Hi,

    On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student)
    <dps7802@rit.edu <mailto:dps7802@rit.edu>> wrote:
    > This works. A big array of eight bit random numbers is constructed:
    >
    > import numpy as np
    >
    > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
    >
    >
    >
    > This fails. It eats up all 64GBy of RAM:
    >
    > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
    >
    >
    > The difference is a factor of two, 2**21 rather than 2**20, for the extent
    > of the first axis.

    I think what's happening is that this:

    np.random.randint(0,255, (2**21,2**12))

    creates 2**33 random integers, which (on 64-bit) will be of dtype
    int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes
    = 512 GiB.

    Cheers,

    Matthew
    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
    https://mail.scipy.org/mailman/listinfo/numpy-discussion




--
David P. Saroff
Rochester Institute of Technology
54 Lomb Memorial Dr, Rochester, NY 14623
david.saroff@mail.rit.edu <mailto:david.saroff@mail.rit.edu> | (434)
227-6242



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion