[Numpy-discussion] record data previous to Numpy use

paul.carrico at free.fr paul.carrico at free.fr
Wed Jul 5 14:39:49 EDT 2017


Hi 

Thanks for the answer: 

ascii file is an input format (and the only one I can deal with) 

HDF5 one might be an export one (it's one of the options) in order to
speed up the post-processing stage 

Paul 

Le 2017-07-05 20:19, Thomas Caswell a écrit :

> Are you tied to ASCII files?   HDF5 (via h5py or pytables) might be a better storage format for what you are describing. 
> 
> Tom 
> 
> On Wed, Jul 5, 2017 at 8:42 AM <paul.carrico at free.fr> wrote: 
> 
>> Dear all 
>> 
>> I'm sorry if my question is too basic (not fully in relation to Numpy - while it is to build matrices and to work with Numpy afterward), but I'm spending a lot of time and effort to find a way to record data from an asci while, and reassign it into a matrix/array ... with unsuccessfully! 
>> 
>> The only way I found is to use _'append()'_ instruction involving dynamic memory allocation. :-( 
>> 
>> From my current experience under Scilab (a like Matlab scientific solver), it is well know: 
>> 
>> * Step 1 : matrix initialization like _'np.zeros(n,n)'_
>> * Step 2 : record the data
>> * and write it in the matrix (step 3)
>> 
>> I'm obviously influenced by my current experience, but I'm interested in moving to Python and its packages 
>> 
>> For huge asci files (involving dozens of millions of lines), my strategy is to work by 'blocks' as : 
>> 
>> * Find the line index of the beginning and the end of one block (this implies that the file is read ounce)
>> * Read the block
>> * (process repeated on the different other blocks)
>> 
>> I tried different codes such as bellow, but each time Python is telling me I CANNOT MIX ITERATION AND RECORD METHOD 
>> 
>> ############################################# 
>> 
>> position = []; j=0 
>> 
>> with open(PATH + file_name, "r") as rough_ data: 
>> 
>> for line in rough_ data: 
>> 
>> if _my_criteria_ in line: 
>> 
>> position.append(j) ## huge blocs but limited in number 
>> 
>> j=j+1 
>> 
>> i = 0 
>> 
>> blockdata = np.zeros( (size_block), dtype=np.float) 
>> 
>> with open(PATH + file_name, "r") as f: 
>> 
>> for line in itertools.islice(f,1,size_block): 
>> 
>> blockdata [i]=float(f.readline() ) 
>> 
>> i=i+1 
>> 
>> ######################################### 
>> 
>> Should I work on lists using f.readlines (but this implies to load all the file in memory). 
>> 
>> Additional question:  can I use record with vectorization, with 'i =np.arange(0,65406)' if I remain  in the previous example 
>> 
>> Thanks for your time and comprehension 
>> 
>> (I'm obviously interested by doc references speaking about those specific tasks) 
>> 
>> Paul 
>> 
>> PS: for Chuck:  I'll had a look to pandas package but in an code optimization step :-) (nearly 2000 doc pages) 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170705/2faa2450/attachment.html>


More information about the NumPy-Discussion mailing list