[Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

M.-A. Lemburg mal at egenix.com
Mon Jul 11 16:29:22 CEST 2005


Neil Hodgson wrote:
>    On unicode versions of Windows, for attributes like os.listdir,
> os.getcwd, sys.argv, and os.environ, which can usefully return unicode
> strings, there are 4 options I see:
> 
> 1) Always return unicode. This is the option I'd be happiest to use,
> myself, but expect this choice would change the behaviour of existing
> code too much and so produce much unhappiness.

Would be nice, but will likely break too much code - if you
let Unicode object enter non-Unicode aware code, it is likely
that you'll end up getting stuck in tons of UnicodeErrors. If you
want to get a feeling for this, try running Python with -U command
line switch.

> 2) Return unicode when the text can not be represented in ASCII. This
> will cause a change of behaviour for existing code which deals with
> non-ASCII data.

+1 on this one (s/ASCII/Python's default encoding).

> 3) Return unicode when the text can not be represented in the default
> code page. While this change can lead to breakage because of combining
> byte string and unicode strings, it is reasonably safe from the point
> of view of data integrity as current code is returning garbage strings
> that look like '?????'.

-1: code pages are evil and the reason why Unicode was invented
in the first place. This would be a step back in history.

> 4) Provide two versions of the attribute, one with the current name
> returning byte strings and a second with a "u" suffix returning
> unicode. This is the least intrusive, requiring explicit changes to
> code to receive unicode data. For patch #1231336 I chose this approach
> producing sys.argvu and os.environu.

-1 - this is what Microsoft did for many of their APIs. The
result is two parallel universes with two sets of features,
bugs, documentation, etc.

>     For os.listdir the current behaviour of returning unicode when its
> argument is unicode can be retained but that is not extensible to, for
> example, sys.argv.

I don't think that using the parameter type as "parameter"
to function is a good idea. However, accepting both strings
and Unicode will make it easier to maintain backwards
compatibility.

>    Since this issue may affect many attributes a common approach
> should be chosen.

Indeed.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 11 2005)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list