[Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt?

Mon Feb 20 14:02:16 EST 2012

Thanks for clearing that up.

On Mon, Feb 20, 2012 at 1:58 PM, Skipper Seabold <jsseabold at gmail.com>wrote:

> On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen at gmail.com>
> wrote:
> > On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams at gwmail.gwu.edu>
> wrote:
> >> Hey everyone,
> >>
> >> I have timeseries data in which the column label is simply a filename
> from
> >> which the original data was taken.  Here's some sample data:
> >>
> >> name1.txt  name2.txt  name3.txt
> >> 32              34            953
> >> 32              03            402
> >>
> >> I've noticed that the standard genfromtxt() method works great;
> however, the
> >> names aren't written correctly.  That is, if I use the command:
> >>
> >> print data['name1.txt']
> >>
> >> Nothing happens.
> >>
> >> However, when I remove the file extension, Eg:
> >>
> >> name1  name2  name3
> >> 32              34            953
> >> 32              03            402
> >>
> >> Then print data['name1'] return (32, 32) as expected.  It seems that the
> >> period in the name isn't compatible with the genfromtxt() names
> attribute.
> >> Is there a workaround, or do I need to restructure my program to get the
> >> extension removed?  I'd rather not do this if possible for reasons that
> >> aren't important for the discussion at hand.
> >
> > It looks like the period is just getting stripped out of the names:
> >
> > In [1]: import numpy as N
> >
> > In [2]: N.genfromtxt('sample.txt', names=True)
> > Out[2]:
> > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> >      dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt',
> '<f8')])
> >
> > Interestingly, this still happens if you supply the names manually:
> >
> > In [17]: def reader(filename):
> >   ....:     infile = open(filename, 'r')
> >   ....:     names = infile.readline().split()
> >   ....:     data = N.genfromtxt(infile, names=names)
> >   ....:     infile.close()
> >   ....:     return data
> >   ....:
> >
> > In [20]: data = reader('sample.txt')
> >
> > In [21]: data
> > Out[21]:
> > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> >      dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt',
> '<f8')])
> >
> > What you can do is reset the names after genfromtxt is through with it,
> though:
> >
> > In [34]: def reader(filename):
> >   ....:     infile = open(filename, 'r')
> >   ....:     names = infile.readline().split()
> >   ....:     infile.close()
> >   ....:     data = N.genfromtxt(filename, names=True)
> >   ....:     data.dtype.names = names
> >   ....:     return data
> >   ....:
> >
> > In [35]: data = reader('sample.txt')
> >
> > In [36]: data
> > Out[36]:
> > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> >      dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt',
> '<f8')])
> >
> > Be warned, I don't know why the period is getting stripped; there may
> > be a good reason, and adding it in might cause problems.
>
> I think the period is stripped because recarrays also offer attribute
> access of names. So you wouldn't be able to do
>
> your_array.sample.txt
>
> All the names get passed through a name validator. IIRC it's something like
>
> from numpy.lib import _iotools
>
> validator = _iotools.NameValidator()
>
> validator.validate('sample1.txt')
> validator.validate('a name with spaces')
>
> NameValidator has a good docstring and the gist of this should be in
> the genfromtxt docs, if it's not already.
>
> Skipper
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120220/03daa794/attachment.html>