[Python-Dev] Divorcing str and unicode (no more implicit conversions).

Mon Oct 3 08:09:13 CEST 2005

Hi.

Like a lot of people (or so I hear in the blogosphere...), I've been
experiencing some friction in my code with unicode conversion
problems.  Even when being super extra careful with the types of str's
or unicode objects that my variables can contain, there is always some
case or oversight where something unexpected happens which results in
a conversion which triggers a decode error.  str.join() of a list of
strs, where one unicode object appears unexpectedly, and voila!
exception galore.  Sometimes the problem shows up late because your
test code doesn't always contain accented characters.  I'm sure many
of you experienced that or some variant at some point.

I came to realize recently that this problem shares strong similarity
with the problem of implicit type conversions in C++, or at least it
feels the same:  Stuff just happens implicitly, and it's hard to track
down where and when it happens by just looking at the code.  Part of
the problem is that the unicode object acts a lot like a str, which is
convenient, but...

What if we could completely disable the implicit conversions between
unicode and str?  In other words, if you would ALWAYS be forced to
call either .encode() or .decode() to convert between one and the
other... wouldn't that help a lot deal with that issue?

How hard would that be to implement?  Would it break a lot of code? 
Would some people want that?  (I know I would, at least for some of my
code.)  It seems to me that this would make the code more explicit and
force the programmer to become more aware of those conversions.  Any
opinions welcome.

cheers,