[Numpy-discussion] parsing tab separated files into dictionaries - alternative to genfromtxt?

Gökhan Sever gokhansever at gmail.com
Wed Nov 11 13:16:22 EST 2009


On Wed, Nov 11, 2009 at 11:53 AM, per freem <perfreem at gmail.com> wrote:
> hi all,
>
> i've been using genfromtxt to parse tab separated files for plotting
> purposes in matplotlib. the problem is that genfromtxt seems to give
> only two ways to access the contents of the file: one is by column,
> where you can use:
>
> d = genfromtxt(...)
>
> and then do d['header_name1'] to access the column named by
> 'header_name1', d['header_name2'] to access the column named by
> 'header_name2', etc.  Or it will allow you to traverse the file line
> by line, and then access each header by number, i.e.
>
> for line in d:
>  field1 = d[0]
>  field2 = d[1]
>  # etc.
>
> the problem is that the second method relies on knowing the order of
> the fields rather than just their name, and the first method does not
> allow line by line iteration.
> ideally what i would like is to be able to traverse each line of the
> parsed file, and then refer to each of its fields by header name, so
> that if the column order in the file changes my program will be
> unaffected:
>
> for line in d:
>  field1 = ['header_name1']
>  field2 = ['header_name2']
>
> is there a way to do this using standard matplotlib/numpy/scipy
> utilities? i could write my own code to do this but it seems like
> something somebody probably already thought of a good representation
> for and has implemented a more optimized version than i could write on
> my own. does such a thing exist?
>
> thanks very much

I have a constructor class to read space-delimited ASCII files.

class NasaFile(object):
   def __init__(self, filename):
            ...
            # Reading data
            _data = np.loadtxt(filename, dtype='float', skiprows=self.NLHEAD).T
            # Read using data['Time'] syntax
            self.data = dict(zip(self.VDESC, _data))
            ...

There is a meta-header in this type of data and NLHEAD is the variable
telling me how many lines to skip to reach the actual data. VDESC
tells me what each columns are (starting with Time variable and many
other different measurement results.)

There is not any column dependence in this case, and generically read
any length specifically formatted data. For instance:

from nasafile import NasaFile

c = NasaFile("mydata")

c.data['Time'] gets me the whole Time column as an ndarray . Why do
you think dictionaries are not sufficient for your case? I was using
locals() to create automatic names but that was not a very wise
approach.



> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Gökhan



More information about the NumPy-Discussion mailing list