[Numpy-discussion] record data previous to Numpy use
Robert McLeod
robbmcleod at gmail.com
Wed Jul 5 20:41:00 EDT 2017
While I'm going to bet that the fastest way to build a ndarray from ascii
is with a 'io.ByteIO` stream, NumPy does have a function to load from text,
`numpy.loadtxt` that works well enough for most purposes.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
It's hard to tell from the original post if the ascii is being continuously
generated or not. If it's being produced in an on-going fashion then a
stream object is definitely the way to go, as the array chunks can be
produced by `numpy.frombuffer()`.
https://docs.python.org/3/library/io.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html
Robert
On Wed, Jul 5, 2017 at 3:21 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Jul 5, 2017 at 5:41 AM, <paul.carrico at free.fr> wrote:
> >
> > Dear all
> >
> > I’m sorry if my question is too basic (not fully in relation to Numpy –
> while it is to build matrices and to work with Numpy afterward), but I’m
> spending a lot of time and effort to find a way to record data from an asci
> while, and reassign it into a matrix/array … with unsuccessfully!
> >
> > The only way I found is to use ‘append()’ instruction involving dynamic
> memory allocation. :-(
>
> Are you talking about appending to Python list objects? Or the np.append()
> function on numpy arrays?
>
> In my experience, it is usually fine to build a list with the `.append()`
> method while reading the file of unknown size and then converting it to an
> array afterwards, even for dozens of millions of lines. The list object is
> quite smart about reallocating memory so it is not that expensive. You
> should generally avoid the np.append() function, though; it is not smart.
>
> > From my current experience under Scilab (a like Matlab scientific
> solver), it is well know:
> >
> > Step 1 : matrix initialization like ‘np.zeros(n,n)’
> > Step 2 : record the data
> > and write it in the matrix (step 3)
> >
> > I’m obviously influenced by my current experience, but I’m interested in
> moving to Python and its packages
> >
> > For huge asci files (involving dozens of millions of lines), my strategy
> is to work by ‘blocks’ as :
> >
> > Find the line index of the beginning and the end of one block (this
> implies that the file is read ounce)
> > Read the block
> > (process repeated on the different other blocks)
>
> Are the blocks intrinsic parts of the file? Or are you just trying to
> break up the file into fixed-size chunks?
>
> > I tried different codes such as bellow, but each time Python is telling
> me I cannot mix iteration and record method
> >
> > #############################################
> >
> > position = []; j=0
> > with open(PATH + file_name, "r") as rough_ data:
> > for line in rough_ data:
> > if my_criteria in line:
> > position.append(j) ## huge blocs but limited in
> number
> > j=j+1
> >
> > i = 0
> > blockdata = np.zeros( (size_block), dtype=np.float)
> > with open(PATH + file_name, "r") as f:
> > for line in itertools.islice(f,1,size_block):
> > blockdata [i]=float(f.readline() )
>
> For what it's worth, this is the line that is causing the error that you
> describe. When you iterate over the file with the `for line in
> itertools.islice(f, ...):` loop, you already have the line text. You don't
> (and can't) call `f.readline()` to get it again. It would mess up the
> iteration if you did and cause you to skip lines.
>
> By the way, it is useful to help us help you if you copy-paste the exact
> code that you are running as well as the full traceback instead of
> paraphrasing the error message.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
--
Robert McLeod, Ph.D.
robert.mcleod at unibas.ch
robert.mcleod at bsse.ethz.ch <robert.mcleod at ethz.ch>
robbmcleod at gmail.com
