[Python-ideas] Processing surrogates in
Serhiy Storchaka
storchaka at gmail.com
Thu May 14 12:15:18 CEST 2015
On 14.05.15 11:48, Andrew Barnert via Python-ideas wrote:
> On Wednesday, May 13, 2015 10:31 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> On 13.05.15 19:22, Nick Coghlan wrote:
>>> Three potential expected sources of surrogates have been identified:
>>>
>>> * escaped surrogates smuggling arbitrary bytes passed through decoding
>>> by the "surrogateescape" error handler
>>> * surrogates passed through the decoding process by the
>>> "surrogatepass" error handler
>>> * decomposed surrogate pairs for astral characters
>>
>> * json
>> * pickle
>> * email
>> * nntplib
>> * SimpleHTTPRequestHandler
>> * wsgiref
>> * cgi
>> * tarfile
>> * filesystem names (os.decode) and other os calls
>> * platform and sysconfig
>> * other serializers
>
> As far as I can tell, all of your extra cases are just examples of the surrogateescape error handler, which Nick already mentioned.
Not all. JSON allows to inject surrogates as \uXXXX. Pickle with
protocol 0 uses the raw-unicode-escape encoding that allows surrogates.
There is also the UTF-7 encoding that allows surrogates. And yet one
source of surrogates -- Python sources. eval(), etc.
Tkinter can produce surrogates. XML parser unfortunately can't
(unfortunately - because it makes impossible to handle with Python some
files generated by third-party programs). I'm not sure about sqlite3.
Any extension module, any wrapper around third-party library could
potentially produce surrogates.
More information about the Python-ideas
mailing list