Terry Reedy writes:
The fundamental problem I am interested in is the separation of raw data from how to use it info.
But this is ambiguous. Take reStructuredText. It *is* text/plain. But it also *is* application/x-structuredtext. Not to forget application/octet-stream. An MUA will treat it as the first, docutils as the second, and gzip as the third.
My underlying idea is that maybe the standard Python distribution should promote encapsulation of encoding info with raw bytes to make bug-free usage easier.
I think you will find that every use case makes different demands on this feature, and that it typically interacts with higher-level needs of the application. There's a reason that ASN.1 is insanely complex and only applications that really need it ever use it. This feature will either be too simple to serve most practical needs, or too complex to serve most practical programmers.<wink>
And "bug-free" usage is hopeless. Much, perhaps the vast majority, of the coding information will be automatically derived from sources you deprecate as "heuristic", like MIME Content-Type headers. It will get attached to the bytes as an attribute, and after that you can't know how reliable it is.
If you have a practical example of such a simple class (bytes + encoding attribute) that serves as a base for more complex applications, I'd really like to see them. But until there are real use cases on the table, I have to say I can't see the proposed facility as being particularly useful to the email package, for example.