[Python-Dev] Python-3.0, unicode, and os.environ
Guido van Rossum
guido at python.org
Mon Dec 8 19:26:46 CET 2008
On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Guido van Rossum wrote:
>>
>> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>
>>> Toshio Kuratomi wrote:
>>>
>>>> - If this is true, a definition of os.listdir(<type 'str'>) that would
>>>> better meet programmer expectation would be: "Give me all files in a
>>>> directory with the output as str type". The definition of
>>>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>>>> with the output as bytes type". Raising an exception when the filenames
>>>> are undecodable is perfectly reasonable in this situation.
>>>
>>> Your examples (snipped) pretty well convince me that there is a use case
>>> for
>>> raising exceptions. We should move beyond arguing over which one way is
>>> right. I think there should be a second argument 'ignorebad=False' to
>>> ignore undecodable files rather than raise the exception (or
>>> 'strict=True'
>>> to stop and raise exception on non-decodable names -- then code is 'if
>>> strict: raise ...'). I believe other functions have a similar parameter.
>
> I was thinking of the "normal Unicode 'errors' parameter", as described by
> Nick.
>
>> If you want the exceptions, just use the bytes API and try to decode
>> the byte strings using the system encoding.
>
> If it was a matter of adding a new method, I might agree. But:
>
> 1. We already have a method that does exactly what you describe. It is only
> a matter of adding flexibility to the response to problems, for which there
> is already precedent.
>
> 2. Suggesting that people who want strings and not bytes should have to deal
> with bytes, just to get an error notification, seems to negate that point of
> moving to 3.0
>
> 3. A builtin would probably do so better than most programmers would, with
> little touches such as the one suggested below.
>
> 4. An error parameter would ALERT programmers to the possibility of a
> PROBLEM, both in the present and future. As you say below, people need to
> better anticipate the future.
>
>> My problem with raising exceptions *by default* when an undecodable
>> name exists is that it may render an app completely useless in a
>> situation where the developer is no longer around. This happened all
>> the time with the 2.x Unicode API, where the developer hadn't
>> anticipated a particular input potentially containing non-ASCII bytes,
>> and the user fed the application non-ASCII text. Making os.listdir
>> raise an exception when a directory contains a single undecodable file
>> means that the entire directory can't be read, and most likely the
>> entire app crashes at that point. Most likely the developer never
>> anticipated this situation (since in most places it is either
>> impossible or very unlikely) -- after all, if they had anticipated it
>> they would have used the bytes API in the first place. (It's worse
>> because the exception being raised would be UnicodeError -- most
>> people expect os.listdir to raise OSError, not other errors.)
>
> This to be is an argument for keeping the default the current behavior, but
> not for rejecting flexibility. The computing world seems to be messier than
> we would like and worse that I realized until this week. As you say below,
> people need to better anticipate the future, and an errors parameter would
> help do that.
I'm fine with whatever API enhancements you can come up with (assuming
others like them too :-) as long as the default remains the current
behavior.
> Is Windows really immune? What about when it reads the directory of
> possibly old removable media with whatever byte name encodings? Is this a
> possible source of 'unanticipated' problems?
>
> As to your last sentence, os.listdir() with an errors parameter could
> convert a decoding UnicodeError to "OSError: undecodable file name
> <ascii+hex repr>", thereby supplying the expected exception as well as an
> extractable representation of problematical the raw bytes
>
> Here is a possible use case: I want filenames as 3.0 strings and I
> anticipate no problems at present but, as you say above, something might
> happen years in the future. I am using 3.0 *because* of the strings ==
> unicode feature. I would like to write
>
> try:
> files = os.listdir(somedir, errors = strict)
> except OSError as e:
> log(<verbose error message that includes somedir and e>)
> files = os.listdir(somedir)
>
> and go one without the problem file but not without logging the problem so a
> future maintainer can consider what to do about it, but only when there is
> an actual need to think about it.
>
> Terry Jan Reedy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list