[Numpy-discussion] record data previous to Numpy use
Thomas Caswell
tcaswell at gmail.com
Wed Jul 5 14:19:40 EDT 2017
Are you tied to ASCII files? HDF5 (via h5py or pytables) might be a
better storage format for what you are describing.
Tom
On Wed, Jul 5, 2017 at 8:42 AM <paul.carrico at free.fr> wrote:
> Dear all
>
>
> I’m sorry if my question is too basic (not fully in relation to Numpy –
> while it is to build matrices and to work with Numpy afterward), but I’m
> spending a lot of time and effort to find a way to record data from an asci
> while, and reassign it into a matrix/array … with unsuccessfully!
>
>
> The only way I found is to use *‘append()’* instruction involving dynamic
> memory allocation. :-(
>
>
> From my current experience under Scilab (a like Matlab scientific solver),
> it is well know:
>
> 1. Step 1 : matrix initialization like *‘np.zeros(n,n)’*
> 2. Step 2 : record the data
> 3. and write it in the matrix (step 3)
>
>
> I’m obviously influenced by my current experience, but I’m interested in
> moving to Python and its packages
>
>
> For huge asci files (involving dozens of millions of lines), my strategy
> is to work by ‘blocks’ as :
>
> - Find the line index of the beginning and the end of one block (this
> implies that the file is read ounce)
> - Read the block
> - (process repeated on the different other blocks)
>
>
> I tried different codes such as bellow, but each time Python is telling me *I
> cannot mix iteration and record method*
>
> #############################################
>
> position = []; j=0
>
> with open(PATH + file_name, "r") as rough_ data:
>
> for line in rough_ data:
>
> if *my_criteria* in line:
>
> position.append(j) ## huge blocs but limited in number
>
> j=j+1
>
>
> i = 0
>
> blockdata = np.zeros( (size_block), dtype=np.float)
>
> with open(PATH + file_name, "r") as f:
>
> for line in itertools.islice(f,1,size_block):
>
> blockdata [i]=float(f.readline() )
>
> i=i+1
>
> #########################################
>
>
> Should I work on lists using f.readlines (but this implies to load all the
> file in memory).
>
>
> *Additional question*: can I use record with vectorization, with ‘i
> =np.arange(0,65406)’ if I remain in the previous example
>
>
>
> Thanks for your time and comprehension
>
> (I’m obviously interested by doc references speaking about those specific
> tasks)
>
>
> Paul
>
>
> PS: for Chuck: I’ll had a look to pandas package but in an code
> optimization step :-) (nearly 2000 doc pages)
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170705/c11b343e/attachment-0001.html>
More information about the NumPy-Discussion
mailing list