[Python-Dev] Python-3.0, unicode, and os.environ

Guido van Rossum guido at python.org
Mon Dec 8 19:26:46 CET 2008


On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Guido van Rossum wrote:
>>
>> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>
>>> Toshio Kuratomi wrote:
>>>
>>>>  - If this is true, a definition of os.listdir(<type 'str'>) that would
>>>> better meet programmer expectation would be: "Give me all files in a
>>>> directory with the output as str type".  The definition of
>>>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>>>> with the output as bytes type".  Raising an exception when the filenames
>>>> are undecodable is perfectly reasonable in this situation.
>>>
>>> Your examples (snipped) pretty well convince me that there is a use case
>>> for
>>> raising exceptions.  We should move beyond arguing over which one way is
>>> right.  I think there should be a second argument 'ignorebad=False' to
>>> ignore undecodable files rather than raise the exception (or
>>> 'strict=True'
>>> to stop and raise exception on non-decodable names -- then code is 'if
>>> strict: raise ...').  I believe other functions have a similar parameter.
>
> I was thinking of the "normal Unicode 'errors' parameter", as described by
> Nick.
>
>> If you want the exceptions, just use the bytes API and try to decode
>> the byte strings using the system encoding.
>
> If it was a matter of adding a new method, I might agree.  But:
>
> 1. We already have a method that does exactly what you describe.  It is only
> a matter of adding flexibility to the response to problems, for which there
> is already precedent.
>
> 2. Suggesting that people who want strings and not bytes should have to deal
> with bytes, just to get an error notification, seems to negate that point of
> moving to 3.0
>
> 3. A builtin would probably do so better than most programmers would, with
> little touches such as the one suggested below.
>
> 4. An error parameter would ALERT programmers to the possibility of a
> PROBLEM, both in the present and future.  As you say below, people need to
> better anticipate the future.
>
>> My problem with raising exceptions *by default* when an undecodable
>> name exists is that it may render an app completely useless in a
>> situation where the developer is no longer around. This happened all
>> the time with the 2.x Unicode API, where the developer hadn't
>> anticipated a particular input potentially containing non-ASCII bytes,
>> and the user fed the application non-ASCII text. Making os.listdir
>> raise an exception when a directory contains a single undecodable file
>> means that the entire directory can't be read, and most likely the
>> entire app crashes at that point. Most likely the developer never
>> anticipated this situation (since in most places it is either
>> impossible or very unlikely) -- after all, if they had anticipated it
>> they would have used the bytes API in the first place. (It's worse
>> because the exception being raised would be UnicodeError -- most
>> people expect os.listdir to raise OSError, not other errors.)
>
> This to be is an argument for keeping the default the current behavior, but
> not for rejecting flexibility.  The computing world seems to be messier than
> we would like and worse that I realized until this week. As you say below,
> people need to better anticipate the future, and an errors parameter would
> help do that.

I'm fine with whatever API enhancements you can come up with (assuming
others like them too :-) as long as the default remains the current
behavior.

> Is Windows really immune?  What about when it reads the directory of
> possibly old removable media with whatever byte name encodings?  Is this a
> possible source of 'unanticipated' problems?
>
> As to your last sentence, os.listdir() with an errors parameter could
> convert a decoding UnicodeError to "OSError: undecodable file name
> <ascii+hex repr>", thereby supplying the expected exception as well as an
> extractable representation of problematical the raw bytes
>
> Here is a possible use case: I want filenames as 3.0 strings and I
> anticipate no problems at present but, as you say above, something might
> happen years in the future.  I am using 3.0 *because* of the strings ==
> unicode feature.  I would like to write
>
> try:
>  files = os.listdir(somedir, errors = strict)
> except OSError as e:
>  log(<verbose error message that includes somedir and e>)
>  files = os.listdir(somedir)
>
> and go one without the problem file but not without logging the problem so a
> future maintainer can consider what to do about it, but only when there is
> an actual need to think about it.
>
> Terry Jan Reedy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list