[Python-Dev] Python-3.0, unicode, and os.environ

Terry Reedy tjreedy at udel.edu
Tue Dec 9 00:58:09 CET 2008


M.-A. Lemburg wrote:

>> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:

>>> try:
>>>  files = os.listdir(somedir, errors = strict)
>>> except OSError as e:
>>>  log(<verbose error message that includes somedir and e>)
>>>  files = os.listdir(somedir)

  > If that error parameter is the same as in unicode(value, errors),
> then this would be a useful feature:

Except that unicode becomes str in 3.0, that is exactly my intention.

> People could then choose among the already existing error handlers
> ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register
> their own ones via the codecs module.

These could be passed through from listdir or getenv to str.

[Side questions:
1. 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string. 
Should it be or is 'xmlcharrefreplace' an addition for a later version.
2. A garbage value for errors (such as 'blah') is silently ignored (so I 
cannot test the above).  Intended or a bug?]

Someone else proposed a new option 'warn', which Guido has accepted to 
be the default instead of the current 'ignore'.  It could not be passed 
through (unless str were changed or something registered).  I believe 
the implementation of that would be to call str with 'strict' but catch 
errors and warn instead.  Whether there should be 1 warning for each 
problematic bytes encountered or 1 for each listdir (or whatever) call, 
possibly with the number of problems, I leave to others to decide.

> Such application specific error handlers could then also apply
> whatever fancy round-trip safe encoding of non-decodable bytes
> to Unicode escapes, private code points, etc. as seen fit by the
> application.
> 
> Perhaps we should also add an ''encoding'' parameter that can be
> set on a per directory basis (if necessary) and defaults to the
> global file system encoding.

That could also be passed through, but I will lets others make the 
argument for it.
> 
> If an application hits directory that is known to cause problems,
> it could then chose to receive the file names in a different,
> more suitable encoding. This allows implementing fallback
> mechanisms with a list of common encodings for a locale.

Terry Jan Reedy




More information about the Python-Dev mailing list