[Python-Dev] Python-3.0, unicode, and os.environ

Terry Reedy tjreedy at udel.edu
Mon Dec 8 00:53:37 CET 2008


Guido van Rossum wrote:
> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> Toshio Kuratomi wrote:
>>
>>>  - If this is true, a definition of os.listdir(<type 'str'>) that would
>>> better meet programmer expectation would be: "Give me all files in a
>>> directory with the output as str type".  The definition of
>>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>>> with the output as bytes type".  Raising an exception when the filenames
>>> are undecodable is perfectly reasonable in this situation.
>> Your examples (snipped) pretty well convince me that there is a use case for
>> raising exceptions.  We should move beyond arguing over which one way is
>> right.  I think there should be a second argument 'ignorebad=False' to
>> ignore undecodable files rather than raise the exception (or 'strict=True'
>> to stop and raise exception on non-decodable names -- then code is 'if
>> strict: raise ...').  I believe other functions have a similar parameter.

I was thinking of the "normal Unicode 'errors' parameter", as described 
by Nick.

> If you want the exceptions, just use the bytes API and try to decode
> the byte strings using the system encoding.

If it was a matter of adding a new method, I might agree.  But:

1. We already have a method that does exactly what you describe.  It is 
only a matter of adding flexibility to the response to problems, for 
which there is already precedent.

2. Suggesting that people who want strings and not bytes should have to 
deal with bytes, just to get an error notification, seems to negate that 
point of moving to 3.0

3. A builtin would probably do so better than most programmers would, 
with little touches such as the one suggested below.

4. An error parameter would ALERT programmers to the possibility of a 
PROBLEM, both in the present and future.  As you say below, people need 
to better anticipate the future.

> My problem with raising exceptions *by default* when an undecodable
> name exists is that it may render an app completely useless in a
> situation where the developer is no longer around. This happened all
> the time with the 2.x Unicode API, where the developer hadn't
> anticipated a particular input potentially containing non-ASCII bytes,
> and the user fed the application non-ASCII text. Making os.listdir
> raise an exception when a directory contains a single undecodable file
> means that the entire directory can't be read, and most likely the
> entire app crashes at that point. Most likely the developer never
> anticipated this situation (since in most places it is either
> impossible or very unlikely) -- after all, if they had anticipated it
> they would have used the bytes API in the first place. (It's worse
> because the exception being raised would be UnicodeError -- most
> people expect os.listdir to raise OSError, not other errors.)

This to be is an argument for keeping the default the current behavior, 
but not for rejecting flexibility.  The computing world seems to be 
messier than we would like and worse that I realized until this week. 
As you say below, people need to better anticipate the future, and an 
errors parameter would help do that.


Is Windows really immune?  What about when it reads the directory of 
possibly old removable media with whatever byte name encodings?  Is this 
a possible source of 'unanticipated' problems?

As to your last sentence, os.listdir() with an errors parameter could 
convert a decoding UnicodeError to "OSError: undecodable file name 
<ascii+hex repr>", thereby supplying the expected exception as well as 
an extractable representation of problematical the raw bytes

Here is a possible use case: I want filenames as 3.0 strings and I 
anticipate no problems at present but, as you say above, something might 
happen years in the future.  I am using 3.0 *because* of the strings == 
unicode feature.  I would like to write

try:
   files = os.listdir(somedir, errors = strict)
except OSError as e:
   log(<verbose error message that includes somedir and e>)
   files = os.listdir(somedir)

and go one without the problem file but not without logging the problem 
so a future maintainer can consider what to do about it, but only when 
there is an actual need to think about it.

Terry Jan Reedy



More information about the Python-Dev mailing list