Re: [SciPy-User] Request for obscure text formats

27 Jun 2021

      Hi David,

The JCAMP files look pretty simple to me, with a well-formatted header.  If
a new instrument came with such a file, I would think "great, I don't have
to spend hours writing speciality code to handle its output".

Is the idea to be able to actually parse files into a usable data structure
including extracting data from the header?

Some of the "obscure" data formats I deal with come from different custom
complex instruments (beamlines) around the world with custom, home-built
data collection systems - they aren't proprietary or deliberately
obfuscated, it just turns out that there is a lot of variety.  There have
been efforts to standardize even the ASCII-only data files, with examples
of real files are at

https://github.com/xraypy/xraylarch/tree/master/examples/xafsdata/beamlines

Just to be clear, parsing those headers at least enough to be able to get
a sensible guess for the name for each column would be an important part of
the goal.  Getting "just" the table of numbers is not a problem.  I have
solutions for that, but I'd be interested to see what you might come up
with.

If you are looking for a real challenge, the CIF format for
Crystallographic Information (see https://www.iucr.org/resources/cif) would
almost certainly provide one.  It uses a primitive ASCII encoding for
multiple tables, basically using a flat-file where yaml, json, or even XML
or SQLite3 would (now) make much more sense.  For people dealing with
atomic structures of crystals,  this format is not obscure.  There are
several existing parsers, including in Python, and many software tools work
with this format.  A real example would look like
http://rruff.geo.arizona.edu/AMS/CIF_text_files/07779_cif.txt  with many
more examples at http://rruff.geo.arizona.edu/AMS/amcsd.php and
https://www.crystallography.net/cod/

--Matt

On Sat, Jun 26, 2021 at 9:27 AM David Hagen <david@drhagen.com> wrote:
...
...
are you interested in issues with text encoding
No, just in (presumably ASCII) text that someone might want to parse
into a Python object. Like a JCAMP-DX file [1] if there was not
already a JCAMP-DX parser on PyPI [2].
...
Is it only "text" or binary data as well
Text is only useful for my immediate purpose because I want to show it
on a slide. However, Parsita can be used to write parsers for byte
strings as well.
[1] http://www.chm.bris.ac.uk/~paulmay/temp/pcc/jcamp.htm
[2] https://pypi.org/project/jcamp/
_______________________________________________
SciPy-User mailing list
SciPy-User@python.org
https://mail.python.org/mailman/listinfo/scipy-user
-- 
--Matt Newville <newville at cars.uchicago.edu> 630-327-7411