* HDF5 supports fixed-length and variable-length string arrays encoded in ASCII and UTF-8. In all cases, these strings are NULL-terminated (despite the documentation claiming that there are more options). In practice, the ASCII strings permit high-bit characters, but the encoding is unspecified. Memory-mapping is rare (but apparently possible). The two major HDF5 bindings are waiting for a fixed-length UTF-8 numpy dtype to support that HDF5 option. Compression is supported for fixed-length string arrays but not variable-length string arrays.
* FITS supports fixed-length string arrays that are NULL-padded. The strings do not have a formal encoding, but in practice, they are typically mostly ASCII characters with the occasional high-bit character from an unspecific encoding. Memory-mapping is a common practice. These arrays can be quite large even if each scalar is reasonably small.
* pandas uses object arrays for flexible in-memory handling of string columns. Lengths are not fixed, and None is used as a marker for missing data. String columns must be written to and read from a variety of formats, including CSV, Excel, and HDF5, some of which are Unicode-aware and work with `unicode/str` objects instead of `bytes`.
* There are a number of sometimes-poorly-documented, often-poorly-adhered-to, aging file format "standards" that include string arrays but do not specify encodings, or such specification is ignored in practice. This can make the usual "Unicode sandwich" at the I/O boundaries difficult to perform.
* In Python 3 environments, `unicode/str` objects are rather more common, and simple operations like equality comparisons no longer work between `bytes` and `unicode/str`, making it difficult to work with numpy string arrays that yield `bytes` scalars.