[IPython-dev] String encoding

Fri Jun 17 02:39:46 EDT 2011

On Thu, Jun 16, 2011 at 4:51 PM, Thomas Kluyver <takowl at gmail.com> wrote:
> My own inclination is simply to say that non-ascii characters will be
> interpreted correctly in unicode literals, but their behaviour in byte
> literals is undefined, and you should use the '\xe9' notation to write bytes
> above 127. Note that Python 3 actually enforces this rule: b"ö" is a
> SyntaxError. But I'd like to collect some more thoughts, or see if we can
> come up with a way to avoid the problem (short of writing our own parser).
>
> Thanks for reading this - it took me a while to properly understand the
> problem, so I hope I've explained it clearly.

Many thanks for the excellent summary, Thomas.

I concur with you, both because (even if slightly less convenient in
py2 due to needing the explicit u prefix) it's semantically consistent
with how unicode objects work, and because of it being the natural
path for py3.

I guess we could make it a configurable option, but I'm not even sure
it's worth the added complexity, so I'm mildly -1 on going down that
road, unless I can be convinced that the implementation isn't that
complicated and that there's really a *major* usability win for
certain users for whom the default is just too annoying to bear.

Cheers,

f