[Python-Dev] Security implications of pep 383
Terry Reedy
tjreedy at udel.edu
Thu Mar 31 04:30:43 CEST 2011
On 3/30/2011 6:39 PM, Toshio Kuratomi wrote:
> Really, surrogates are a red herring to this whole issue. The issue is that
> the original code was trying to compare two different transformations of
> byte sequences and expecting them to be equal. Let's say that you have the
> following byte value::
> b_test_value = b'\xa4\xaf'
>
> This is something that's stored in a file or the filename of something on
> a unix filesystem or stored in a database or any number of other things.
> Now you want to compare that to another piece of data that you've read in
> from somewhere outside of python. You'd expect any of the following to
> work::
> b_test_value == b_other_byte_value
> b_test_value.encode('utf-8', 'surrogateescape') == b_other_byte_value('utf-8', 'surrogateescape')
> b_test_value.encode('latin-1') == b_other_byte_value('latin-1')
> b_test_value.encode('euc_jp') == b_other_byte_value('euc_jp')
>
> You wouldn't expect this to work::
> b_test_value.encode('latin-1') == b_other_byte_value('euc_jp')
>
> Once you see that, you realize that the following is only a specific case of
> the former, surrogateescape doesn't really matter::
> b_test_value.encode('utf-8', 'surrogateescape') == b_other_byte_value('euc_jp')
All the encodes above should be decodes instead. Aside from that. your
point is correct, and not limited to CS. The whole art of disguise, for
instance, is about effecting a transformation to falsely pass or fail an
identity or equality comparison.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list