[issue2980] Pickle stream for unicode object may contain non-ASCII characters.

Wed Oct 22 00:56:15 CEST 2008

Dan Dibagh <dddibagh at lavabit.com> added the comment:

I am well aware why my example produces an error from a technical
standpoint. What I'm getting at is the decision to implement
PyUnicode_EncodeRawUnicodeEscape the way it is. Probably there is
nothing wrong with it, but how am I supposed to know? I read the PEP,
which serves as a specification of raw unicode escape (at least for the
decoding bit) and the reference documentation. Then I read the source
trying to map between specified behavior in the documentation and the
implementation in the source code. When it comes to the part which
causes the problem with non-ASCII characters, it is difficult to follow.

Or in other words: what is the high level reason why the codec won't
escape \x80 in my test program?

To use a real-world term; an interface specification, in this case the
pickle documentation, is the contract between the consumer of the
library and the provider of the library. If it states "ASCII", ASCII is
expected. If it doesn't state "for debugging only" it will be used for
non-debugging purposes. There isn't much you can do about it without
breaking the contract. 

What makes you think that the problem cannot be fixed without changing
the existing pickle format 0?

Note that base64 is "a common" way to deal with binary data in ascii
streams rather than "the common". (But why should I care when my data is
already ascii?)

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2980>
_______________________________________