[New-bugs-announce] [issue18814] Add tools for "cleaning" surrogate escaped strings

Nick Coghlan report at bugs.python.org
Fri Aug 23 06:02:32 CEST 2013


New submission from Nick Coghlan:

Prompted by issue 18713 and http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/, here are some possible utilities we could add to the codecs module to help deal with/debug issues related to surrogate escaped strings:

    def has_escaped_bytes(s):
        """Returns true if string contains surrogate escaped bytes"""
        ...

    def replace_escaped_bytes(s):
        """Replaces each surrogate escaped byte with a valid code point"""
        ...

    def decode_escaped_bytes(s, nominal_encoding, actual_encoding):
        """Reinterprets incorrectly decoded text using a new encoding"""
        return s.encode(nominal_encoding, 'surrogateescape').decode(actual_encoding)

----------
messages: 195937
nosy: ncoghlan
priority: normal
severity: normal
stage: needs patch
status: open
title: Add tools for "cleaning" surrogate escaped strings
type: enhancement
versions: Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________


More information about the New-bugs-announce mailing list