Forbidden charcter in the "names" argument of genfromtxt?

Hey everyone, I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data: name1.txt name2.txt name3.txt 32 34 953 32 03 402 I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command: print data['name1.txt'] Nothing happens. However, when I remove the file extension, Eg: name1 name2 name3 32 34 953 32 03 402 Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand. Thanks.

On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:
Hey everyone,
I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:
name1.txt name2.txt name3.txt 32 34 953 32 03 402
I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command:
print data['name1.txt']
Nothing happens.
However, when I remove the file extension, Eg:
name1 name2 name3 32 34 953 32 03 402
Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.
It looks like the period is just getting stripped out of the names: In [1]: import numpy as N In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')]) Interestingly, this still happens if you supply the names manually: In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....: In [20]: data = reader('sample.txt') In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')]) What you can do is reset the names after genfromtxt is through with it, though: In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....: In [35]: data = reader('sample.txt') In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')]) Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems. ~Brett

Thanks Brett. I appreciate you taking the time to help me out. In particular, I did not know the correct syntax for this: data.dtype.names = names Which is very helpful. If I would have known how to access data.dtype.names, I think it would have saved me a great deal of trouble. I guess it's all part of a learning curve. I'll keep in mind that the period may cause problems later; however, as far as I can tell so far, there's nothing going wrong when I access the data. On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote:
Hey everyone,
I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:
name1.txt name2.txt name3.txt 32 34 953 32 03 402
I've noticed that the standard genfromtxt() method works great; however,
On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote: the
names aren't written correctly. That is, if I use the command:
print data['name1.txt']
Nothing happens.
However, when I remove the file extension, Eg:
name1 name2 name3 32 34 953 32 03 402
Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.
It looks like the period is just getting stripped out of the names:
In [1]: import numpy as N
In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
Interestingly, this still happens if you supply the names manually:
In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....:
In [20]: data = reader('sample.txt')
In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
What you can do is reset the names after genfromtxt is through with it, though:
In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....:
In [35]: data = reader('sample.txt')
In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])
Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems.
~Brett _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote:
On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:
Hey everyone,
I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:
name1.txt name2.txt name3.txt 32 34 953 32 03 402
I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command:
print data['name1.txt']
Nothing happens.
However, when I remove the file extension, Eg:
name1 name2 name3 32 34 953 32 03 402
Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.
It looks like the period is just getting stripped out of the names:
In [1]: import numpy as N
In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
Interestingly, this still happens if you supply the names manually:
In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....:
In [20]: data = reader('sample.txt')
In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
What you can do is reset the names after genfromtxt is through with it, though:
In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....:
In [35]: data = reader('sample.txt')
In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])
Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems.
I think the period is stripped because recarrays also offer attribute access of names. So you wouldn't be able to do your_array.sample.txt All the names get passed through a name validator. IIRC it's something like from numpy.lib import _iotools validator = _iotools.NameValidator() validator.validate('sample1.txt') validator.validate('a name with spaces') NameValidator has a good docstring and the gist of this should be in the genfromtxt docs, if it's not already. Skipper

Thanks for clearing that up. On Mon, Feb 20, 2012 at 1:58 PM, Skipper Seabold <jsseabold@gmail.com>wrote:
On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:
Hey everyone,
I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:
name1.txt name2.txt name3.txt 32 34 953 32 03 402
I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command:
print data['name1.txt']
Nothing happens.
However, when I remove the file extension, Eg:
name1 name2 name3 32 34 953 32 03 402
Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.
It looks like the period is just getting stripped out of the names:
In [1]: import numpy as N
In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
Interestingly, this still happens if you supply the names manually:
In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....:
In [20]: data = reader('sample.txt')
In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
What you can do is reset the names after genfromtxt is through with it,
On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote: though:
In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....:
In [35]: data = reader('sample.txt')
In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt',
'<f8')])
Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems.
I think the period is stripped because recarrays also offer attribute access of names. So you wouldn't be able to do
your_array.sample.txt
All the names get passed through a name validator. IIRC it's something like
from numpy.lib import _iotools
validator = _iotools.NameValidator()
validator.validate('sample1.txt') validator.validate('a name with spaces')
NameValidator has a good docstring and the gist of this should be in the genfromtxt docs, if it's not already.
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Adam Hughes
-
Brett Olsen
-
Skipper Seabold