Allan,

I see with a google search on your name that you are in the physics department at Rutgers. I got my BA in Physics there. 1975. Biological physics. A thought: Is there an entropy that can be assigned to the dna in an organism? I don't mean the usual thing, coupled to the heat bath. Evolution blindly explores metabolic and signalling pathways, and tends towards disorder, as long as it functions. Someone working out signaling pathways some years ago wrote that they were senselessly complex, branched and interlocked. I think that is to be expected. Evolution doesn't find minimalist, clear, rational solutions. Look at the amazon rain forest. What are all those beetles and butterflies and frogs for? It is the wrong question. I think some measure of the complexity could be related to the amount of time that ecosystem has existed. Similarly for genomes.

On Sun, Dec 6, 2015 at 6:55 PM, Allan Haldane <allanhaldane@gmail.com> wrote:

I've also often wanted to generate large datasets of random uint8 and uint16. As a workaround, this is something I have used:

np.ndarray(100, 'u1', np.random.bytes(100))

It has also crossed my mind that np.random.randint and np.random.rand could use an extra 'dtype' keyword. It didn't look easy to implement though.

Allan

On 12/06/2015 04:55 PM, DAVID SAROFF (RIT Student) wrote:
Matthew,

That looks right. I'm concluding that the .astype(np.uint8) is applied
after the array is constructed, instead of during the process. This
random array is a test case. In the production analysis of radio
telescope data this is how the data comes in, and there is no  problem
with 10GBy files.
linearInputData = np.fromfile(dataFile, dtype = np.uint8, count = -1)
spectrumArray = linearInputData.reshape(nSpectra,sizeSpectrum)


On Sun, Dec 6, 2015 at 4:07 PM, Matthew Brett <matthew.brett@gmail.com
<mailto:matthew.brett@gmail.com>> wrote:

    Hi,

    On Sun, Dec 6, 2015 at 12:39 PM, DAVID SAROFF (RIT Student)
    <dps7802@rit.edu <mailto:dps7802@rit.edu>> wrote:
    > This works. A big array of eight bit random numbers is constructed:
    >
    > import numpy as np
    >
    > spectrumArray = np.random.randint(0,255, (2**20,2**12)).astype(np.uint8)
    >
    >
    >
    > This fails. It eats up all 64GBy of RAM:
    >
    > spectrumArray = np.random.randint(0,255, (2**21,2**12)).astype(np.uint8)
    >
    >
    > The difference is a factor of two, 2**21 rather than 2**20, for the extent
    > of the first axis.

    I think what's happening is that this:

    np.random.randint(0,255, (2**21,2**12))

    creates 2**33 random integers, which (on 64-bit) will be of dtype
    int64 = 8 bytes, giving total size 2 ** (21 + 12 + 6) = 2 ** 39 bytes
    = 512 GiB.

    Cheers,

    Matthew
    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
    https://mail.scipy.org/mailman/listinfo/numpy-discussion




--
David P. Saroff
Rochester Institute of Technology
54 Lomb Memorial Dr, Rochester, NY 14623
david.saroff@mail.rit.edu <mailto:david.saroff@mail.rit.edu> | (434)
227-6242



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion



--
David P. Saroff
Rochester Institute of Technology
54 Lomb Memorial Dr, Rochester, NY 14623
david.saroff@mail.rit.edu | (434) 227-6242