Which is the better way to parse this file?
Roberto A. F. De Almeida
roberto at dealmeida.net
Tue Sep 2 13:09:25 EDT 2003
"Terry Reedy" <tjreedy at udel.edu> wrote in message news:<au2dnSnl__hyPMmiU-KYgw at comcast.com>...
> I suspect that what you actually want to do is parse structures 'like'
> the above, as defined be a grammar not shown ;-)
Yes, you're right. :)
The grammar is not complex, but I'm still struggling to process the
result tree.
> You did not specify whether you will get such files from an
> uncontrolable external source or whether you control the input format.
> If the later, there is no obvious reason for separate database,
> sequence, and structure productions since all three result in
> dictionaries with no functional difference.
This is a Dataset Descriptor for the Data Access Protocol
(http://www.unidata.ucar.edu/packages/dods/design/dap-rfc-html/), an
API to access remote datasets. DAP servers describe their datasets
using this grammar, and I'm developing a module to access DAP servers.
> > I want to obtain a dictionary like this:
> >
> > >>> pprint.pprint(data)
> > {'casts': {'experimenter': None,
> > 'location': {'latitude': None, 'longitude': None},
> > 'time': None,
> > 'xbt': {'depth': None, 'temperature': None}},
> > 'catalog_number': None}
> > The values ('None') will be filled later.
>
> Using None as placeholders either tosses the type information or
> requires that it be recorded elsewhere. Use the int and float type
> objects instead. Note that standard Python cannot differentiate
> between float and float64.
Ok. One of the strong points of DAP is that data is retrieved only for
your region/period of interest. I created a class and redefined
__getitem__ so that data is only retrieved from the server when the
object is sliced.
>>> data = file("http://dods.gso.uri.edu/cgi-bin/nph-nc/data/fnoc1.nc")
>>> print data.variables['lat'].shape
(17,)
>>> print data.variables['lat'][1:4] # only this subset is retrieved
[ 47.5 45. 42.5 40. ]
> I know nothing of SimpleParse (and therefore, of what would be
> different). If the grammar is as simple as I infer from the sample --
> dataset and sequences containing sequences, structures, and types -- I
> would reread about recursive-descent parsing and maybe try that. The
> type_entry function would return a (name, typeobject) pair and the
> structure, sequence, and database functions a (name, dict) pair.
Yes, it's very simple. As you see, even a structure is identical to a
sequence. The declarations are basically "types" or declarations
containing "types". Do you think it can be done without 3rd party
modules?
> But as hinted above, I would think about simplifying the grammar
> before worryinng about parsing. If you only have sequences of
> sequences and type entries, parsing is trivial.
I'll take a look in that. Thanks very much for the insights.
Regards,
Roberto
More information about the Python-list
mailing list