[python-committers] 3.0rc2 schedule

Thu Oct 2 19:53:30 CEST 2008

On Thu, Oct 2, 2008 at 10:08 AM, Fred Drake <fdrake at acm.org> wrote:
> On Oct 2, 2008, at 9:39 AM, Nick Coghlan wrote:
>>
>> If you don't make a habit of borking your own filesystems with dodgy
>> filenames, it runs fine.
>
> I really hope the individuals making this argument are being facetious.  I
> don't think this is the source of the problem at all.
>
> The expect the most common occurrence of the problem comes from sharing of
> drives between operating systems and individual configurations; those
> ubiquitous little USB "thumb" drives get shared between all kinds of
> computers these days as people share files they don't want to or can't pass
> over a network for whatever reason.  (Those drives might actually serve
> other purposes first, such as being music players, and so may have no other
> interfaces for transferring files.)
>
> If someone hands me a USB flash drive with filenames encoded in whatever is
> reasonable for them, I should be able to use Python tools on the files
> without having to use non-Python tools to copy or rename the file first.
>  The possibility of a conflicting encoding is increased if the source
> machine is configured to use a very different encoding, clearly, but that's
> not that unusual.
>
> The world is smaller than it used to be, and we really need to understand
> that.

All good points.

However no matter how you spin it, we're in a hard place. If we
maintain that filenames should always be represented as text strings,
we have no choice of coming up with a way of encoding all possible
byte sequences into Unicode strings, using a reversible encoding. This
has been shown to be hard no matter what encoding you favor -- as soon
as those "Unicode" strings are passed on to other libraries or
programs nobody is sure how they will be treated.

If we switch to the view that all filenames are bytes after all,
Windows loses, because because not all filenames are representable
that way (unless you deviate from the encoding that Windows has chosen
for you, which presents other problems). Also, it would be a *huge*
project, since filenames are so ubiquitous.

There are a number of ways out, but I don't think we'll be able to
come up with a solution without doing a lot of experimentation.
Therefore I believe the best thing to do is to release 3.0 with a
low-level solution that makes it possible to carry out those
experiments. I am hoping that Martin will check in his
sys.setfilesystemencoding() function, and am am working on Victor
Stinner's code for better supporting filenames-as-bytes (in addition
to, not instead of filenames-as-text), and I expect that these two are
together to allow the necessary experimentation to take place
post-3.0.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)