Hi all,

On Friday, 4 Jan 2013, we had a hangout to discuss what the next steps to be taken for the implementation of a C language library for GDF. We call this library implementation gdfio. For reference, the GDF standard can be found at

https://bitbucket.org/yt_analysis/yt/src/554d144d9d248c6f70d8c665a5963aa39b2d6bb3/yt/utilities/grid_data_format/docs/gdf_specification.txt?at=yt

One of the biggest issues we discussed was whether or not to rely on libraries other than HDF5 when writing gdfio. The main argument for doing so is that none of us are experienced C programmers; thus implementing things like hashes, linked lists, and so forth might be a barrier to making progress. The main argument against doing so is that gdfio should be a very low-level library that can be deployed on many different systems, and dependencies make this difficult.

In order to best assess what to do, we attempted to identify *what* non-stdlib C features we would actually need in our implementation. Because GDF itself does not make any links between grids (only recording their parents in an optional metadata step), we came to the conclusion that the only thing we need is a hash table. Even this hash table is optional and only for reading. For example, you could do something like gdfio_read_grid(grid_id, "density") and get back an object that includes both the density data and their associated metadata. We thus decided to proceed without *any* dependencies aside from HDF5.

Kacper pointed out that the most important issue for efficiency is how to convert native memory structures into GDF's /data/grid_%010i/ structures without expensive copies. Native data could be 5D (block, quantity, x, y, z), 4D, or 3D depending on the code. On this point, Sam noted that the easiest thing might be to require a "buffer" type interface that gives (pointer, size) and allows gdfio to grab the requisite number of floats or doubles. We decided to try this buffer approach first. Essentially, this is a question of how much gdfio provides to users who will be wrapping it to write their code's data. For now, we decided to keep gdfio's offerings minimal. This allows us to see how it will work and what might be the best additional features to add later.

Casey and Sam both brought up issues of how to parallelize. One issue is that parallel HDF5 can be quite tricky to deal with, so we decided to forgo using parallel HDF5 for now. We agreed that the simplest path forward is to use file links, so that each non-root-IO processor writes its data to a separate data-only file that is linked back to the main HDF5 file. This means we need to add an API for creating, writing, and reading from data-only files.

Finally, we came up with the next step: I (Jeff) will draft an API for gdfio this week. I will submit it to yt-dev for discussion and iteration. Once we have an API that looks good, we'll begin coding it up. 

If I misrepresented anything from the meeting or GDF, please let me know. Thanks to all who participated!

j