I'm in full agreement with Marc-Andre below, except I don't like (1)
at all -- having used other APIs that always return Unicode (like the
Python XML parsers) it bothers me to get Unicode for no reason at all.
OTOH I think Python 3.0 should be using a Unicode model closer to
Java's.
On 7/11/05, M.-A. Lemburg
Neil Hodgson wrote:
On unicode versions of Windows, for attributes like os.listdir, os.getcwd, sys.argv, and os.environ, which can usefully return unicode strings, there are 4 options I see:
1) Always return unicode. This is the option I'd be happiest to use, myself, but expect this choice would change the behaviour of existing code too much and so produce much unhappiness.
Would be nice, but will likely break too much code - if you let Unicode object enter non-Unicode aware code, it is likely that you'll end up getting stuck in tons of UnicodeErrors. If you want to get a feeling for this, try running Python with -U command line switch.
2) Return unicode when the text can not be represented in ASCII. This will cause a change of behaviour for existing code which deals with non-ASCII data.
+1 on this one (s/ASCII/Python's default encoding).
3) Return unicode when the text can not be represented in the default code page. While this change can lead to breakage because of combining byte string and unicode strings, it is reasonably safe from the point of view of data integrity as current code is returning garbage strings that look like '?????'.
-1: code pages are evil and the reason why Unicode was invented in the first place. This would be a step back in history.
4) Provide two versions of the attribute, one with the current name returning byte strings and a second with a "u" suffix returning unicode. This is the least intrusive, requiring explicit changes to code to receive unicode data. For patch #1231336 I chose this approach producing sys.argvu and os.environu.
-1 - this is what Microsoft did for many of their APIs. The result is two parallel universes with two sets of features, bugs, documentation, etc.
For os.listdir the current behaviour of returning unicode when its argument is unicode can be retained but that is not extensible to, for example, sys.argv.
I don't think that using the parameter type as "parameter" to function is a good idea. However, accepting both strings and Unicode will make it easier to maintain backwards compatibility.
Since this issue may affect many attributes a common approach should be chosen.
Indeed.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Jul 11 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
-- --Guido van Rossum (home page: http://www.python.org/~guido/)