[Numpy-discussion] Np.genfromtxt Problem

Fri Oct 4 18:38:16 EDT 2019

On 5 Oct 2019, at 12:15 am, Andras Deak <deak.andris at gmail.com> wrote:
> 
> On Fri, Oct 4, 2019 at 7:31 PM Stephen P. Molnar <s.molnar at sbcglobal.net> wrote:
>> 
>> 
>> I have a snippet of code
>> 
>> #!/usr/bin/env python3
>> # -*- coding: utf-8 -*-
>> """
>> 
>> Created on Tue Sep 24 07:51:11 2019
>> 
>> """
>> import numpy as np
>> 
>> files = []
>> 
>> data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8,
>> skip_footer=1, encoding=None)
>> 
>> print(data)
>> 
>> 
>> If file is a single file the code generates the data that I want.
>> However I have a list of files that I want to process. According to
>> numpy.genfromtxt fname can be a "File, filename, list, or generator to
>> read."  If I use [13-7a_apo-1acl.RMSD    13-7_apo-1acl.RMSD
>> 14-7_apo-1acl.RMSD    15-7_apo-1acl.RMSD    17-7_apo-1acl.RMSD ] get the
>> error:
> 
> Hi Stephen,
> 
> As far as I know genfromtxt is designed to read the contents of a
> single file. Consider this quote from the docs for the first
> parameter:
> "The strings in a list or produced by a generator are treated as lines."
> And the general description of the function says
> "Load data from a text file, with missing values handled as specified."
> ("a text file", singular)
> So if I understand correctly the list case is there so that you can
> pass `f.readlines()` or equivalent into genfromtxt. From a
> higher-level standpoint, how would reading multiple files behave if
> the files have different structure, and what type and shape should the
> function return in that case?
> If one file can be read just fine then I suggest looping over them to
> read each, one after the other. You can then tell python what to do
> with each returned array and so it doesn't have to guess.

The above is correct in that genfromtxt expects a single file or file-like object.
That said, assuming all input files have compatible format (i.e. identical no. of
columns with matching dtypes), which really is the only case that would make
sense to pass to genfromtxt, you could try creating a pipe to concatenate all
input files into a single object. Something like this might work:

fobj = os.popen('cat 1[3457]-7a_apo-1acl.RMSD’)
data = np.genfromtxt(fobj, usecols=(3), dtype=None, …)

However the multiple headers and footers in your concatenated file may cause
trouble here - maybe you find a way to remove them in the popen call with some
'[e]grep -v’ artistry. Depending on this, the loop over input files might be the easier
solution.

HTH,
					Derek