I have a snippet of code #!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Tue Sep 24 07:51:11 2019 """ import numpy as np files = [] data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None) print(data) If file is a single file the code generates the data that I want. However I have a list of files that I want to process. According to numpy.genfromtxt fname can be a "File, filename, list, or generator to read." If I use [13-7a_apo-1acl.RMSD 13-7_apo-1acl.RMSD 14-7_apo-1acl.RMSD 15-7_apo-1acl.RMSD 17-7_apo-1acl.RMSD ] get the error: runfile('/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/Test/DeltaGTable_s.py', wdir='/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/Test', current_namespace=True) Traceback (most recent call last): File "/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/Test/DeltaGTable_s.py", line 12, in <module> data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None) File "/home/comp/Apps/Miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1762, in genfromtxt next(fhd) StopIteration I have tried very combination of search terms that I can think of in order to find an example of how to make this work without success. How can I make this work? Thanks in advance. -- Stephen P. Molnar, Ph.D. www.molecular-modeling.net 614.312.7528 (c) Skype: smolnar1
On Fri, Oct 4, 2019 at 7:31 PM Stephen P. Molnar <s.molnar@sbcglobal.net> wrote:
I have a snippet of code
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """
Created on Tue Sep 24 07:51:11 2019
""" import numpy as np
files = []
data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None)
print(data)
If file is a single file the code generates the data that I want. However I have a list of files that I want to process. According to numpy.genfromtxt fname can be a "File, filename, list, or generator to read." If I use [13-7a_apo-1acl.RMSD 13-7_apo-1acl.RMSD 14-7_apo-1acl.RMSD 15-7_apo-1acl.RMSD 17-7_apo-1acl.RMSD ] get the error:
Hi Stephen, As far as I know genfromtxt is designed to read the contents of a single file. Consider this quote from the docs for the first parameter: "The strings in a list or produced by a generator are treated as lines." And the general description of the function says "Load data from a text file, with missing values handled as specified." ("a text file", singular) So if I understand correctly the list case is there so that you can pass `f.readlines()` or equivalent into genfromtxt. From a higher-level standpoint, how would reading multiple files behave if the files have different structure, and what type and shape should the function return in that case? If one file can be read just fine then I suggest looping over them to read each, one after the other. You can then tell python what to do with each returned array and so it doesn't have to guess. Regards, András
runfile('/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/Test/DeltaGTable_s.py', wdir='/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/Test', current_namespace=True) Traceback (most recent call last):
File "/home/comp/Apps/Models/1-PhosphorusLigands/CombinedLigands/MOL/Docking/Results/RMSDTable/Test/DeltaGTable_s.py", line 12, in <module> data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None)
File "/home/comp/Apps/Miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1762, in genfromtxt next(fhd)
StopIteration
I have tried very combination of search terms that I can think of in order to find an example of how to make this work without success.
How can I make this work?
Thanks in advance.
-- Stephen P. Molnar, Ph.D. www.molecular-modeling.net 614.312.7528 (c) Skype: smolnar1
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 5 Oct 2019, at 12:15 am, Andras Deak <deak.andris@gmail.com> wrote:
On Fri, Oct 4, 2019 at 7:31 PM Stephen P. Molnar <s.molnar@sbcglobal.net> wrote:
I have a snippet of code
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """
Created on Tue Sep 24 07:51:11 2019
""" import numpy as np
files = []
data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None)
print(data)
If file is a single file the code generates the data that I want. However I have a list of files that I want to process. According to numpy.genfromtxt fname can be a "File, filename, list, or generator to read." If I use [13-7a_apo-1acl.RMSD 13-7_apo-1acl.RMSD 14-7_apo-1acl.RMSD 15-7_apo-1acl.RMSD 17-7_apo-1acl.RMSD ] get the error:
Hi Stephen,
As far as I know genfromtxt is designed to read the contents of a single file. Consider this quote from the docs for the first parameter: "The strings in a list or produced by a generator are treated as lines." And the general description of the function says "Load data from a text file, with missing values handled as specified." ("a text file", singular) So if I understand correctly the list case is there so that you can pass `f.readlines()` or equivalent into genfromtxt. From a higher-level standpoint, how would reading multiple files behave if the files have different structure, and what type and shape should the function return in that case? If one file can be read just fine then I suggest looping over them to read each, one after the other. You can then tell python what to do with each returned array and so it doesn't have to guess.
The above is correct in that genfromtxt expects a single file or file-like object. That said, assuming all input files have compatible format (i.e. identical no. of columns with matching dtypes), which really is the only case that would make sense to pass to genfromtxt, you could try creating a pipe to concatenate all input files into a single object. Something like this might work: fobj = os.popen('cat 1[3457]-7a_apo-1acl.RMSD’) data = np.genfromtxt(fobj, usecols=(3), dtype=None, …) However the multiple headers and footers in your concatenated file may cause trouble here - maybe you find a way to remove them in the popen call with some '[e]grep -v’ artistry. Depending on this, the loop over input files might be the easier solution. HTH, Derek
On Fri, Oct 4, 2019, at 10:31, Stephen P. Molnar wrote:
data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None)
This seems like a good use case for `dask.dataframe.read_csv` [0]. Stéfan [0] https://examples.dask.org/dataframes/01-data-access.html#Read-CSV-files
On 10/04/2019 08:19 PM, Stefan van der Walt wrote:
On Fri, Oct 4, 2019, at 10:31, Stephen P. Molnar wrote:
data = np.genfromtxt(files, usecols=(3), dtype=None, skip_header=8, skip_footer=1, encoding=None) This seems like a good use case for `dask.dataframe.read_csv` [0].
St??fan
[0] https://examples.dask.org/dataframes/01-data-access.html#Read-CSV-files _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I appreciate the responses that I've received. I feel that I must apologize for the one important fact it would appear I railed to mention - all of the files that I wish to process are identical. -- Stephen P. Molnar, Ph.D. www.molecular-modeling.net 614.312.7528 (c) Skype: smolnar1
participants (4)
-
Andras Deak
-
Derek Homeier
-
Stefan van der Walt
-
Stephen P. Molnar