[issue7615] unicode_escape codec does not escape quotes
Richard Hansen
report at bugs.python.org
Thu Jan 7 23:42:58 CET 2010
Richard Hansen <rhansen at bbn.com> added the comment:
> We'll need a patch that implements single and double quote escaping
> for unicode_escape and a \uXXXX style escaping of quotes for the
> raw_unicode_escape encoder.
OK, I'll remove unicode_escape_single_quotes.patch and update unicode_escape_reorg.patch.
> Other changes are not necessary.
Would you please clarify? There are a few other (minor) bugs that were discovered while writing unicode_escape_reorg.patch that I think should be fixed:
* the UTF-16 surrogate pair decoding logic could read past the end of the provided Py_UNICODE character array if the last character is between 0xD800 and 0xDC00
* _PyString_Resize() will be called on an empty string if the size argument of unicodeescape_string() is 0. This will raise a SystemError because _PyString_Resize() can only be called if the object's ref count is 1 (even if no resizing is to take place) yet PyString_FromStringAndSize() returns a shared empty string instance if size is 0.
* it is unclear what unicodeescape_string() should do if size < 0
Beyond those issues, I'm worried about manageability stemming from the amount of code duplication. If a bug is found in one of those encoding functions, the other two will likely need updating.
> The pickle copy of the codec can be left untouched (both cPickle.c
> and pickle.py) - it doesn't matter whether quotes are escaped or not
> in the pickle data stream.
Unfortunately, pickle.py must be modified because it does its own backslash escaping before encoding with the raw_unicode_escape codec. This means that backslashes would become double escaped and the decoded value would differ (confirmed by running the pickle unit tests).
The (minor) bugs in PyUnicode_EncodeRawUnicodeEscape() are also present in cPickle.c, so they should probably be fixed as well.
> The codecs' encode direction is not defined anywhere in the
> documentation, AFAIK, and basically an implementation detail.
I read the escape codec documentation (see the original post) as implying that the encoders can generate eval-able string literals. I'll add some clarifying statements.
Thanks for the feedback!
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7615>
_______________________________________
More information about the Python-bugs-list
mailing list