Zooko O'Whielacronx wrote:
Following-up to my own post to correct a major error:
Is it true that srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', 'python-escape') will always produce srcbytes ? That is my Requirement
If you start with bytes, decode with utf-8b to unicode (possibly 'invalid'), and encode the result back to bytes with utf-8b, you should get the original bytes, regardless of what they were. That is the point of PEP 383 -- to reliably roundtrip file 'names' that start as bytes and must end as the same bytes but which may not otherwise have a unicode decoding. If you start with invalid unicode text, encode to bytes with utf-8b, and decode back to unicode, you might instead get a different and valid unicode text. An example was given in the discussion. I believe this would be hard to avoid. An any case, it does not matter for the use case of starting with bytes that one wants to temporarily but surely work with as text. Terry Jan Reedy