I qualify with a). I believe I understand c) but, as explained in my other post, I do not think your reason applies. In fact, I think concern for naming rights might suggest that you *not* reuse the name for something different. I would have to learn more about the existing 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'. 'Surrogates-escape' is pretty good for the new handler since, to my understanding, it 'escapes' 'bad bytes' by prefixing them with bits that push them to the surrogates plane.
See issue 3672. In essence, in python 2.5: py> u"\ud800".encode("utf-8") '\xed\xa0\x80' py> '\xed\xa0\x80'.decode("utf-8") u'\ud800' In 3.1, py> "\ud800".encode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed py> "\ud800".encode("utf-8","surrogates") b'\xed\xa0\x80' py> b'\xed\xa0\x80'.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: illegal encoding py> b'\xed\xa0\x80'.decode("utf-8","surrogates") '\ud800' Regards, Martin