[Python-Dev] Proposal to revert r54204 (splitext change)

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Fri Mar 16 19:57:53 CET 2007

"Martin v. Löwis" writes:

 > Phillip J. Eby schrieb:

 > > Some other options:
 > > 
 > > 1. Deprecate splitext() and remove it in 3.0
 > How would that help the problem? Isn't it useful to have a function
 > that strips off the extension?

No.  It's useful to have a function that performs a well-specified
algorithm on file names containing dots, but (except on certain file
systems such as FAT) "the extension" does not uniquely exist.  People
and programs will disagree on the decomposition of "file.tar.gz".  On
FAT file systems, it's defined incorrectly according to you.  As IIRC
glyph pointed out, if you're going to include either shell semantics
(dotfiles) or content semantics (file "type" for a generic "open
anything" command) in the specification of "file extension", what I
prefer is guess_file_type(), not splitext().

A more emphatic way to express this is, I would never use a library
function whose semantics were defined as "split a file name into the
base and the extension" because I would expect gratuitous backward
incompatibility of the kind you have introduced (and it could go
either way).[1]

N.B. Backward compatibility can be defined by reference to an
implementation (often denigrated as "bug compatibility") or to a
specification.  This change is backward incompatible with respect to
the implementation and the docstring specification.

I would personally prefer the 2.4 definition of splitext(), merely
because it's so simple.  I would (absent this long discussion<wink>)
always have to look up the treatment of dotfiles, anyway, and my own
only use (in one function, 3 calls) of splitext is precisely

def versioned_file_name (filename, version):
    base, ext = splitext (filename)
    return "%s.%s%s" % (base,version,ext)

 > > 2. Add an optional flag argument to enable the new behavior
 > How would that help backwards compatibility?

As Steve Holden points out, by preserving it if the flag is omitted.

That is so obvious that I think merely asking that question is
perverse.  You seem to be granting official status to the unwritten
and controversial "intuitive" specification that many programmers
guess from the name.  That is way out of line with any sane
interpretation of "compatibility with past versions of Python".

I think all of the advocates of changing the function rather than the
library reference are being very short-sighted.  I agree with you that
looking at this one case, it will be very expensive for all those who
have (currently broken) code that expects splitext to treat dotfiles
as having a base that starts with a dot (rather than an empty base) to
change to use a new function.  (I think the realistic solution for
them is monkeypatching.)

But using this to justify the backward incompatibility is like
thinking you can eat just one potato chip, and so not go off your
diet.  The incompatibility costs of applying this greedy algorithm to
all cases where the Python specification differs from common intuition
will not merely be very expensive, they will be astronomically so --
but practically invisible because the cost will be in terms of
developers who refuse to upgrade their applications to take advantage
of new features of Python because their own library code gets broken.

The only path I can see where it makes sense to make this change is as
an appendix to glyph's PEP.  The appendix could give a list of such
changes and the decision, relative to some base version.  Or it could
specify that contributions to 2.6 can be backward incompatible, but
not afterward.  In that case the promise to eat only this one potato
chip becomes credible.

I prefer the explicit list approach, that would force the discussion
to occur in one place, so that both proponents and opponents of each
change would be made aware of how many such changes are being made.


More information about the Python-Dev mailing list