[Python-Dev] Python-3.0, unicode, and os.environ

Victor Stinner victor.stinner at haypocalc.com
Fri Dec 5 19:20:59 CET 2008


Hi,

> > But they are open questions (already asked in the bug tracker):
>
> I answered these in the bug tracker.  Here are the answers for the
> mailing list:

Oh, sorry. I didn't follow the end of the discussion on the bug tracker.

> >    os.environb['PATH'] = '\xff'
> >    => os.environ['PATH'] = ???
>
>      os.environ['PATH'] => raises KeyError because PATH is not a key in
> the unicode decoded environment.

Ok, good answer :-)

> >    os.environ['PATH'] = chr(0x10000)
> >    => os.environb['PATH'] = ???
>
> raise UnicodeEncodeError when setting the value.

Ok, it's consistent the current behaviour.

$ LANG=C ./python
Python 3.0rc3+ (py3k:67498M, Dec  4 2008, 17:45:54)
>>> import os
>>> os.environ['x'] = '\xff'
>>> os.environ['x']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/haypo/prog/py3k/Lib/io.py", line 1491, in write
    b = encoder.encode(s)
  File "/home/haypo/prog/py3k/Lib/encodings/ascii.py", line 22, in encode
    return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: 
ordinal not in range(128)

Oh, that's strange :-p The error is delayed when we read the value.

> > It would be maybe easier if os.environ supports bytes and unicode keys.
> > But we have to keep these assertions:
> >    os.environ[bytes] -> bytes
> >    os.environ[str] -> str
>
> I think the same choices have to be made here.  If LANG=C, we still have
> to decide what to do when os.environ[str] is set to a non-ASCii string.

If the charset is US-ASCII, os.environ will drop non-ASCII values. But most 
variables are ASCII only. Examples with my shell:

$ env
XCURSOR_THEME=kubuntu
LANG=fr_FR.UTF-8
EDITOR=vim
HOME=/home/haypo
...

> Additionally, the subprocess question makes using the key value
> undesirable compared with having a separate os.environb that accesses
> the same underlying data.

The user should be able to choose bytes or unicode. Examples:
 - subprocess.Popen('ls') => use unicode environment (os.environ)
 - subprocess.Popen(b'ls') => use bytes environment (os.environb)

> Here's my problem with it, though.  With these semantics any program
> that works on arbitrary files and runs on *NIX has to check
> os.listdir(b'') and do the conversion manually.

Only programs that have to support strange environment like yours (mixing 
Shift-JIS and UTF-8) :-) Most programs don't have to support these charset 
mixture.

We can imagine an higher library working on UNIX and Windows (bytes or 
Unicode). But that would be later.

> I think the desired behaviour assuming the existence of a nondecodable
> file is this:

I prefer the current behaviour :-)

> Why do you think that glob.glob('*.py') is special and should not traceback?

It's not special. glob() reuses listdir(), and it was an example to show 
that "it just works".

> I just differ in that I think lack of tracebacks when
> UnicodeDecodeErrors are encountered is a wart in python3 that did not
> exist in python2.

Right.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/


More information about the Python-Dev mailing list