James Y Knight wrote:
On Jul 15, 2006, at 3:15 PM, M.-A. Lemburg wrote:
Note that it also helps setting the default encoding to 'unknown'. That way you disable the coercion of strings to Unicode and all the places where this implicit conversion takes place crop up, allowing you to take proper action (i.e. explicit conversion or changing of the string to Unicode as appropriate).
I've tried that before to verify no such conversion issues occurred in Twisted, but, as the python stdlib isn't usable like that, it's hard to use it to find bugs in any other libraries. (in particular, the re module is badly broken, some other stuff was too).
True: it breaks a lot of code that was written to work with both strings and Unicode (or does so by accident ;-).
The stdlib isn't too well prepared for Unicode yet, so if your code relies a lot on it, then the above may not be the right strategy for you.
Perhaps a new ASCII codec that issues warnings for all these cases would help ?! (one that still converts to Unicode assuming ASCII, but issues a warning pointing to the location in the code where the conversion happend)