[Numpy-discussion] record data previous to Numpy use

Robert Kern robert.kern at gmail.com
Wed Jul 5 18:21:36 EDT 2017


On Wed, Jul 5, 2017 at 5:41 AM, <paul.carrico at free.fr> wrote:
>
> Dear all
>
> I’m sorry if my question is too basic (not fully in relation to Numpy –
while it is to build matrices and to work with Numpy afterward), but I’m
spending a lot of time and effort to find a way to record data from an asci
while, and reassign it into a matrix/array … with unsuccessfully!
>
> The only way I found is to use ‘append()’ instruction involving dynamic
memory allocation. :-(

Are you talking about appending to Python list objects? Or the np.append()
function on numpy arrays?

In my experience, it is usually fine to build a list with the `.append()`
method while reading the file of unknown size and then converting it to an
array afterwards, even for dozens of millions of lines. The list object is
quite smart about reallocating memory so it is not that expensive. You
should generally avoid the np.append() function, though; it is not smart.

> From my current experience under Scilab (a like Matlab scientific
solver), it is well know:
>
> Step 1 : matrix initialization like ‘np.zeros(n,n)’
> Step 2 : record the data
> and write it in the matrix (step 3)
>
> I’m obviously influenced by my current experience, but I’m interested in
moving to Python and its packages
>
> For huge asci files (involving dozens of millions of lines), my strategy
is to work by ‘blocks’ as :
>
> Find the line index of the beginning and the end of one block (this
implies that the file is read ounce)
> Read the block
> (process repeated on the different other blocks)

Are the blocks intrinsic parts of the file? Or are you just trying to break
up the file into fixed-size chunks?

> I tried different codes such as bellow, but each time Python is telling
me I cannot mix iteration and record method
>
> #############################################
>
> position = []; j=0
> with open(PATH + file_name, "r") as rough_ data:
>             for line in rough_ data:
>                 if my_criteria in line:
>                     position.append(j) ## huge blocs but limited in number
>                 j=j+1
>
>         i = 0
>         blockdata = np.zeros( (size_block), dtype=np.float)
>         with open(PATH + file_name, "r") as f:
>                  for line in itertools.islice(f,1,size_block):
>                      blockdata [i]=float(f.readline() )

For what it's worth, this is the line that is causing the error that you
describe. When you iterate over the file with the `for line in
itertools.islice(f, ...):` loop, you already have the line text. You don't
(and can't) call `f.readline()` to get it again. It would mess up the
iteration if you did and cause you to skip lines.

By the way, it is useful to help us help you if you copy-paste the exact
code that you are running as well as the full traceback instead of
paraphrasing the error message.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170705/42aca62c/attachment.html>


More information about the NumPy-Discussion mailing list