[Python-Dev] Security implications of pep 383

Toshio Kuratomi a.badger at gmail.com
Thu Mar 31 00:39:29 CEST 2011


On Wed, Mar 30, 2011 at 08:36:43AM +0200, Lennart Regebro wrote:
> On Wed, Mar 30, 2011 at 07:54, Toshio Kuratomi <a.badger at gmail.com> wrote:
> > Lennart is missing that you just need to use the same encoding
> > + surrogateescape (or stick with bytes) for decoding the byte strings that
> > you are comparing.
> 
> You lost me here. I need to do this for what?
"""
The lesson here seems to be "if you have to use blacklists, and you
use unicode strings for those blacklists, also make sure the string
you compare with doesn't have surrogates".>
"""

Really, surrogates are a red herring to this whole issue.  The issue is that
the original code was trying to compare two different transformations of
byte sequences and expecting them to be equal.  Let's say that you have the
following byte value::
  b_test_value = b'\xa4\xaf'

This is something that's stored in a file or the filename of something on
a unix filesystem or stored in a database or any number of other things.
Now you want to compare that to another piece of data that you've read in
from somewhere outside of python.  You'd expect any of the following to
work::
  b_test_value == b_other_byte_value
  b_test_value.encode('utf-8', 'surrogateescape') == b_other_byte_value('utf-8', 'surrogateescape')
  b_test_value.encode('latin-1') == b_other_byte_value('latin-1')
  b_test_value.encode('euc_jp') == b_other_byte_value('euc_jp')

You wouldn't expect this to work::
  b_test_value.encode('latin-1') == b_other_byte_value('euc_jp')

Once you see that, you realize that the following is only a specific case of
the former, surrogateescape doesn't really matter::
  b_test_value.encode('utf-8', 'surrogateescape') == b_other_byte_value('euc_jp')

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110330/c52ddc69/attachment.pgp>


More information about the Python-Dev mailing list