
I think we can implement viewers for strings as ndarray subclasses. Then one could do `my_string_array.view(latin_1)`, and so on. Essentially that just changes the default encoding of the 'S' array. That could also work for uint8 arrays if needed.
Chuck
To handle structured data-types containing encoded strings, we'd also need to subclass `np.void`. Things would get messy when a structured dtype contains two strings in different encodings (or more likely, one bytestring and one textstring) - we'd need some way to specify which fields are in which encoding, and using subclasses means that this can't be contained within the dtype information. So I think there's a strong argument for solving this with`dtype`s rather than subclasses. This really doesn't seem hard though. Something like (C-but-as-python): def ENCSTRING_getitem(ptr, arr): # The PyArrFuncs slot encoded = STRING_getitem(ptr, arr) return encoded.decode(arr.dtype.encoding) def ENCSTRING_setitem(val, ptr, arr): # The PyArrFuncs slot val = val.encode(arr.dtype.encoding) # todo: handle "safe" truncation, where safe might mean keep codepoints, keep graphemes, or never allow STRING_setitem(val, ptr, arr)) We'd probably need to be careful to do a decode/encode dance when copying from one encoding to another, but we [already have bugs](https://github.com/numpy/numpy/issues/3258) in those cases anyway. Is it reasonable that the user of such an array would want to work with plain `builtin.unicode` objects, rather than some special numpy scalar type? Eric