Stephen J. Turnbull wrote:
No, it doesn't. It means 'abc' followed by something that cannot be encoded by any codec without the surrogateescape handler. 'ascii-compatible' merely defaults to that handler. I wouldn't actually be too upset if I were told, no, you have to specify explicitly.
If I understand correctly, your intention is that 61 62 63 FF in this representation would simply be a more compact version of 0061 0062 0063 DCFF, with exactly the same semantics. If that's right, then maybe something like "compressed surrogateescape" or "8-bit surrogateescape" would be a better name for it? Also, it could be produced automatically where possible by any decoding operation that specified surrogateescape -- there wouldn't have to be a dedicated encoding name for it (although there could be for convenience). It could also potentially be produced by any slicing or other string operations that resulted in characters within the appropriate ranges, just like any of the other internal representations. -- Greg