[Python-Dev] Bytes path related questions for Guido
Stephen J. Turnbull
stephen at xemacs.org
Thu Aug 28 03:08:43 CEST 2014
Glenn Linderman writes:
> On 8/26/2014 4:31 AM, MRAB wrote:
> > On 2014-08-26 03:11, Stephen J. Turnbull wrote:
> >> Nick Coghlan writes:
> > How about:
> >
> > replace_surrogate_escapes(s, replacement='\uFFFD')
> >
> > If you want them removed, just pass an empty string as the
> > replacement.
That seems better to me (I had too much C for breakfast, I think).
> And further, replacement could be a vector of 128 characters, to do
> immediate transcoding,
Using what encoding? If you knew that much, why didn't you use
(write, if necessary) an appropriate codec? I can't envision this
being useful.
OTOH, I could see using
replace_surrogate_escapes(s, replacement='�')
in HTML. (Actually, probably not; if it makes sense to use Unicode
features you're probably using Unicode as the external encoding, so a
character entity is silly. But there might be contexts with a useful
multicharacter replacements.)
> or a single character to do wholesale replacement with some
> gibberish character, or None to remove (or an empty string).
Not None, that means default (which should be the Unicode standard
REPLACEMENT CHARACTER U+FFFD).
Steve
More information about the Python-Dev
mailing list