Hi David, The JCAMP files look pretty simple to me, with a well-formatted header. If a new instrument came with such a file, I would think "great, I don't have to spend hours writing speciality code to handle its output". Is the idea to be able to actually parse files into a usable data structure including extracting data from the header? Some of the "obscure" data formats I deal with come from different custom complex instruments (beamlines) around the world with custom, home-built data collection systems - they aren't proprietary or deliberately obfuscated, it just turns out that there is a lot of variety. There have been efforts to standardize even the ASCII-only data files, with examples of real files are at https://github.com/xraypy/xraylarch/tree/master/examples/xafsdata/beamlines Just to be clear, parsing those headers at least enough to be able to get a sensible guess for the name for each column would be an important part of the goal. Getting "just" the table of numbers is not a problem. I have solutions for that, but I'd be interested to see what you might come up with. If you are looking for a real challenge, the CIF format for Crystallographic Information (see https://www.iucr.org/resources/cif) would almost certainly provide one. It uses a primitive ASCII encoding for multiple tables, basically using a flat-file where yaml, json, or even XML or SQLite3 would (now) make much more sense. For people dealing with atomic structures of crystals, this format is not obscure. There are several existing parsers, including in Python, and many software tools work with this format. A real example would look like http://rruff.geo.arizona.edu/AMS/CIF_text_files/07779_cif.txt with many more examples at http://rruff.geo.arizona.edu/AMS/amcsd.php and https://www.crystallography.net/cod/ --Matt On Sat, Jun 26, 2021 at 9:27 AM David Hagen <david@drhagen.com> wrote:
are you interested in issues with text encoding
No, just in (presumably ASCII) text that someone might want to parse into a Python object. Like a JCAMP-DX file [1] if there was not already a JCAMP-DX parser on PyPI [2].
Is it only "text" or binary data as well
Text is only useful for my immediate purpose because I want to show it on a slide. However, Parsita can be used to write parsers for byte strings as well.
[1] http://www.chm.bris.ac.uk/~paulmay/temp/pcc/jcamp.htm [2] https://pypi.org/project/jcamp/ _______________________________________________ SciPy-User mailing list SciPy-User@python.org https://mail.python.org/mailman/listinfo/scipy-user
-- --Matt Newville <newville at cars.uchicago.edu> 630-327-7411