[issue4006] os.getenv silently discards env variables with non-UTF-8 values

Toshio Kuratomi report at bugs.python.org
Thu Oct 2 16:32:23 CEST 2008


Toshio Kuratomi <a.badger at gmail.com> added the comment:

It's not a feature it's a bug! :-)  (I hope you meant to have a smiley
too ;-)

As stated in the os.listdir() related bug, on Unix filesystems filenames
are a sequence of bytes.  The system encoding allows the user-level
tools to display the filenames as characters instead of byte sequences
and allows you to manipulate the filenames using characters instead of
byte sequences.  But if you change your locale the user level tools will
interpret the byte sequences as different characters and allow you free
access to create files in a different encoding.

So in order to work correctly on Unix you must be able to accept byte
sequences in place of filename.

The sad fact of the matter is that while we can be all unicode with data
and strings inside of python we will always have to be prepared to
handle supposed strings as byte sequences when talking to some things
outside of ourselves.  Sometimes the border has a specification that
tells us what encoding to expect and we can do conversion automatically.
 But when it doesn't we have to be prepared to 1) tell the user that the
data exists even but isn't string type as expected and 2) make the byte
sequence available to the user.

Silently pretending that the data doesn't exist at all is a bug (maybe a
minor bug depending on how often we expect the situation to arise but
still a bug.)

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4006>
_______________________________________


More information about the Python-bugs-list mailing list