[Python-Dev] Bytes path related questions for Guido
Nick Coghlan
ncoghlan at gmail.com
Sun Aug 24 17:26:43 CEST 2014
On 25 August 2014 00:23, Antoine Pitrou <antoine at python.org> wrote:
> Le 24/08/2014 09:04, Nick Coghlan a écrit :
>> Serhiy & Ezio convinced me to scale this one back to a proposal for
>> "codecs.clean_surrogate_escapes(s)", which replaces surrogates that
>> may be produced by surrogateescape (that's what string.clean() above
>> was supposed to be, but my description was not correct, and the name
>> was too vague for that error to be obvious to the reader)
>
>
> "clean" conveys the wrong meaning. It should use a scary word such as
> "trap". "Cleaning" surrogates is unlikely to be the right procedure when
> dealing with surrogates produced by undecodable byte sequences.
"purge_surrogate_escapes" was the other term that occurred to me.
Either way, my use case is to filter them out when I *don't* want to
pass them along to other software, but would prefer the Unicode
replacement character to the ASCII question mark created by using the
"replace" filter when encoding.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list