<div dir="ltr">Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a better storage format for what you are describing.<div><br></div><div>Tom</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Jul 5, 2017 at 8:42 AM <<a href="mailto:paul.carrico@free.fr">paul.carrico@free.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="font-size:10pt;font-family:Verdana,Geneva,sans-serif">

<p>Dear all</p>

<p><br></p>

<p>I’m sorry if my question is too basic (not fully in relation to Numpy – while it is to build matrices and to work with Numpy afterward), but I’m spending a lot of time and effort to find a way to record data from an asci while, and reassign it into a matrix/array … with unsuccessfully!</p>

<p><br></p>

<p>The only way I found is to use <em>‘append()’</em> instruction involving dynamic memory allocation. <span style="color:#ff0000">:-(</span></p>

<p><br></p>

<p>From my current experience under Scilab (a like Matlab scientific solver), it is well know:</p>

<ol>

<li>Step 1 : matrix initialization like <em>‘np.zeros(n,n)’</em></li>

<li>Step 2 : record the data</li>

<li>and write it in the matrix (step 3)</li>

</ol>

<p><br></p>

<p>I’m obviously influenced by my current experience, but I’m interested in moving to Python and its packages</p>

<p><br></p>

<p>For huge asci files (involving dozens of millions of lines), my strategy is to work by ‘blocks’ as :</p>

<ul>

<li>Find the line index of the beginning and the end of one block (this implies that the file is read ounce)</li>

<li>Read the block</li>

<li>(process repeated on the different other blocks)</li>

</ul>

<p><br></p>

<p>I tried different codes such as bellow, but each time Python is telling me <strong>I cannot mix iteration and record method</strong></p>

<p>#############################################</p>

<p>position = []; j=0</p>

<p>with open(PATH + file_name, "r") as rough_ data:</p>

<p>            for line in rough_ data:</p>

<p>                if <em>my_criteria</em> in line:</p>

<p>                    position.append(j) ## huge blocs but limited in number</p>

<p>                j=j+1</p>

<p><br></p>

<p>        i = 0</p>

<p>        blockdata = np.zeros( (size_block), dtype=np.float)</p>

<p>        with open(PATH + file_name, "r") as f:</p>

<p>                 for line in itertools.islice(f,1,size_block):</p>

<p>                     blockdata [i]=float(f.readline() )</p>

<p>                     i=i+1</p>

<p> #########################################</p>

<p><br></p>

<p>Should I work on lists using f.readlines (but this implies to load all the file in memory).</p>

<p><br></p>

<p><u>Additional question</u>:  can I use record with vectorization, with ‘i =np.arange(0,65406)’ if I remain  in the previous example</p>

<p><br></p>

<p><br></p>

<p>Thanks for your time and comprehension</p>

<p>(I’m obviously interested by doc references speaking about those specific tasks)</p>

<p><br></p>

<p>Paul</p>

<p><br></p>

<p>PS: for Chuck:  I’ll had a look to pandas package but in an code optimization step :-) (nearly 2000 doc pages)</p>

<p><br></p>

<p><br></p>

<p><br></p>

<p><br></p>


</div>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

</blockquote></div>