Python 3 encoding question: Read a filename from stdin, subsequently open that filename

Nobody nobody at nowhere.com
Wed Dec 1 17:45:53 EST 2010


On Wed, 01 Dec 2010 10:34:24 +0100, Peter Otten wrote:

>> Python 3.x's decision to treat filenames (and environment variables) as
>> text even on Unix is, in short, a bug. One which, IMNSHO, will mean that
>> Python 2.x is still around when Python 4 is released.
> 
> For filenames in Python 3 the user has the choice between "text" (str) and 
> bytes. If the user chooses text that will be converted to bytes using a 
> default encoding that hopefully matches that of the other tools on the 
> machine that manipulate filenames. 

However, sys.argv and os.environ are automatically converted to text. If
you want bytes, you have to convert them back explicitly.

Also, I'm unsure as to how far the choice between bytes and str will
extend beyond the core modules.

> I see that you may run into problems with the text approach when you 
> encounter byte sequences that are illegal in the chosen encoding.

This was actually a critical flaw in Python 3.0, as it meant that
filenames which weren't valid in the locale's encoding simply couldn't be
passed via argv or environ. 3.1 fixed this using the "surrogateescape"
encoding, so now it's only an annoyance (i.e. you can recover the original
bytes once you've spent enough time digging through the documentation).

There could be a problem with encodings which aren't invertable (e.g.
ISO-2022), but those tend to be quite rare and Python flat-out doesn't
support those as system encodings anyhow.





More information about the Python-list mailing list