Glenn Linderman writes:
On 8/26/2014 4:31 AM, MRAB wrote:
On 2014-08-26 03:11, Stephen J. Turnbull wrote:
Nick Coghlan writes:
How about:
replace_surrogate_escapes(s, replacement='\uFFFD')
If you want them removed, just pass an empty string as the replacement.
That seems better to me (I had too much C for breakfast, I think).
And further, replacement could be a vector of 128 characters, to do immediate transcoding,
Using what encoding? If you knew that much, why didn't you use (write, if necessary) an appropriate codec? I can't envision this being useful. OTOH, I could see using replace_surrogate_escapes(s, replacement='�') in HTML. (Actually, probably not; if it makes sense to use Unicode features you're probably using Unicode as the external encoding, so a character entity is silly. But there might be contexts with a useful multicharacter replacements.)
or a single character to do wholesale replacement with some gibberish character, or None to remove (or an empty string).
Not None, that means default (which should be the Unicode standard REPLACEMENT CHARACTER U+FFFD). Steve