[Python-Dev] Bytes path related questions for Guido

Stephen J. Turnbull stephen at xemacs.org
Thu Aug 28 03:08:43 CEST 2014


Glenn Linderman writes:
 > On 8/26/2014 4:31 AM, MRAB wrote:
 > > On 2014-08-26 03:11, Stephen J. Turnbull wrote:
 > >> Nick Coghlan writes:

 > > How about:
 > >
 > >     replace_surrogate_escapes(s, replacement='\uFFFD')
 > >
 > > If you want them removed, just pass an empty string as the
 > > replacement.

That seems better to me (I had too much C for breakfast, I think).

 > And further, replacement could be a vector of 128 characters, to do
 > immediate transcoding,

Using what encoding?  If you knew that much, why didn't you use
(write, if necessary) an appropriate codec?  I can't envision this
being useful.

OTOH, I could see using

    replace_surrogate_escapes(s, replacement='�')

in HTML.  (Actually, probably not; if it makes sense to use Unicode
features you're probably using Unicode as the external encoding, so a
character entity is silly.  But there might be contexts with a useful
multicharacter replacements.)

 > or a single character to do wholesale replacement with some
 > gibberish character, or None to remove (or an empty string).

Not None, that means default (which should be the Unicode standard
REPLACEMENT CHARACTER U+FFFD).

Steve


More information about the Python-Dev mailing list