Mailman 3 Forbidden charcter in the "names" argument of genfromtxt? - NumPy-Discussion

newer
Re: [Numpy-discussion] ndarray and...

Forbidden charcter in the "names" argument of genfromtxt?

older
ndarray and lazy evaluation (was:...

Adam Hughes

Feb. 19, 2012

2:12 a.m.

Hey everyone, I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data: name1.txt name2.txt name3.txt 32 34 953 32 03 402 I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command: print data['name1.txt'] Nothing happens. However, when I remove the file extension, Eg: name1 name2 name3 32 34 953 32 03 402 Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand. Thanks.

Attachments:

attachment.htm (text/html — 1.0 KB)

Show replies by date

Brett Olsen

February 2012

6:35 p.m.

New subject: Forbidden charcter in the "names" argument of genfromtxt?

On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:

...

Hey everyone,

I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:

name1.txt name2.txt name3.txt 32 34 953 32 03 402

I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command:

print data['name1.txt']

Nothing happens.

However, when I remove the file extension, Eg:

name1 name2 name3 32 34 953 32 03 402

Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.

It looks like the period is just getting stripped out of the names: In [1]: import numpy as N In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')]) Interestingly, this still happens if you supply the names manually: In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....: In [20]: data = reader('sample.txt') In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')]) What you can do is reset the names after genfromtxt is through with it, though: In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....: In [35]: data = reader('sample.txt') In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')]) Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems. ~Brett

Adam Hughes

6:43 p.m.

New subject: Forbidden charcter in the "names" argument of genfromtxt?

Thanks Brett. I appreciate you taking the time to help me out. In particular, I did not know the correct syntax for this: data.dtype.names = names Which is very helpful. If I would have known how to access data.dtype.names, I think it would have saved me a great deal of trouble. I guess it's all part of a learning curve. I'll keep in mind that the period may cause problems later; however, as far as I can tell so far, there's nothing going wrong when I access the data. On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote:

...

...
Hey everyone,

I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:

name1.txt name2.txt name3.txt 32 34 953 32 03 402

I've noticed that the standard genfromtxt() method works great; however,

On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote: the

...
names aren't written correctly. That is, if I use the command:

print data['name1.txt']

Nothing happens.

However, when I remove the file extension, Eg:

name1 name2 name3 32 34 953 32 03 402

Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.

It looks like the period is just getting stripped out of the names:

In [1]: import numpy as N

In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

Interestingly, this still happens if you supply the names manually:

In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....:

In [20]: data = reader('sample.txt')

In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

What you can do is reset the names after genfromtxt is through with it, though:

In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....:

In [35]: data = reader('sample.txt')

In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])

Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems.

~Brett _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Skipper Seabold

6:58 p.m.

New subject: Forbidden charcter in the "names" argument of genfromtxt?

On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote:

...

On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:

...
Hey everyone,

I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:

name1.txt name2.txt name3.txt 32 34 953 32 03 402

I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command:

print data['name1.txt']

Nothing happens.

However, when I remove the file extension, Eg:

name1 name2 name3 32 34 953 32 03 402

Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.

It looks like the period is just getting stripped out of the names:

In [1]: import numpy as N

In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

Interestingly, this still happens if you supply the names manually:

In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....:

In [20]: data = reader('sample.txt')

In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

What you can do is reset the names after genfromtxt is through with it, though:

In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....:

In [35]: data = reader('sample.txt')

In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])

Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems.

I think the period is stripped because recarrays also offer attribute access of names. So you wouldn't be able to do your_array.sample.txt All the names get passed through a name validator. IIRC it's something like from numpy.lib import _iotools validator = _iotools.NameValidator() validator.validate('sample1.txt') validator.validate('a name with spaces') NameValidator has a good docstring and the gist of this should be in the genfromtxt docs, if it's not already. Skipper

Adam Hughes

7:02 p.m.

New subject: Forbidden charcter in the "names" argument of genfromtxt?

Thanks for clearing that up. On Mon, Feb 20, 2012 at 1:58 PM, Skipper Seabold <jsseabold@gmail.com>wrote:

...

...
On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:

...
Hey everyone,

I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data:

name1.txt name2.txt name3.txt 32 34 953 32 03 402

I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command:

print data['name1.txt']

Nothing happens.

However, when I remove the file extension, Eg:

name1 name2 name3 32 34 953 32 03 402

Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand.

It looks like the period is just getting stripped out of the names:

In [1]: import numpy as N

In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

Interestingly, this still happens if you supply the names manually:

In [17]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: data = N.genfromtxt(infile, names=names) ....: infile.close() ....: return data ....:

In [20]: data = reader('sample.txt')

In [21]: data Out[21]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

What you can do is reset the names after genfromtxt is through with it,

On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote: though:

...
In [34]: def reader(filename): ....: infile = open(filename, 'r') ....: names = infile.readline().split() ....: infile.close() ....: data = N.genfromtxt(filename, names=True) ....: data.dtype.names = names ....: return data ....:

In [35]: data = reader('sample.txt')

In [36]: data Out[36]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt',

'<f8')])

...
Be warned, I don't know why the period is getting stripped; there may be a good reason, and adding it in might cause problems.

I think the period is stripped because recarrays also offer attribute access of names. So you wouldn't be able to do

your_array.sample.txt

All the names get passed through a name validator. IIRC it's something like

from numpy.lib import _iotools

validator = _iotools.NameValidator()

validator.validate('sample1.txt') validator.validate('a name with spaces')

NameValidator has a good docstring and the gist of this should be in the genfromtxt docs, if it's not already.

Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

4769

Age (days ago)

4770

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Adam Hughes
Brett Olsen
Skipper Seabold

Forbidden charcter in the "names" argument of genfromtxt?

Adam Hughes

Brett Olsen

Adam Hughes

Skipper Seabold

Adam Hughes

tags

participants (3)