[issue14626] os module: use keyword-only arguments for dir_fd and nofollow to reduce function count

Larry Hastings report at bugs.python.org
Sat May 26 06:54:46 CEST 2012


Larry Hastings <larry at hastings.org> added the comment:

Here's my first pass at a patch.  For this patch,
I took the proposal to its logical extreme: I removed
every function in os that was both mildly redundant
with an existing function *and* has been added since
3.2, and moved that functionality to the equivalent
existing function, making it accessible with the use
of keyword-only parameters.

Specifically:

This function has been removed, and instead
|              this parameter has been added to
|              |                this function
|              |                |
v              v                v
-------------------------------------
faccessat      dir_fd           access
faccessat      effective_ids    access
faccessat      follow_symlinks  access
fchmodat       dir_fd           chmod
fchmodat       follow_symlinks  chmod
fchownat       dir_fd           chown
fchownat       follow_symlinks  chown
fexecve        fd               execve
fgetxattr      fd               getxattr
flistdir       fd               listdir
flistxattr     fd               listxattr
fremovexattr   fd               removexattr
fsetxattr      fd               setxattr
fstatat        dir_fd           stat
futimens       fd               utime
futimes        fd               utime
futimesat      dir_fd           utime
lgetxattr      follow_symlinks  getxattr
linkat         dst_dir_fd       link
linkat         src_dir_fd       link
linkat         follow_symlinks  link
llistxattr     follow_symlinks  listxattr
lremovexattr   follow_symlinks  removexattr
lsetxattr      follow_symlinks  setxattr
lutimes        follow_symlinks  utime
mkdirat        dir_fd           mkdir
mkfifoat       dir_fd           mkfifoat
mknodat        dir_fd           mknod
open           dir_fd           openat
readlinkat     dir_fd           readlink
renameat       dst_dir_fd       rename
renameat       src_dir_fd       rename
symlinkat      dir_fd           symlink
unlinkat       dir_fd           unlink
unlinkat       remove_directory unlink
utimensat      dir_fd           utime
utimensat      follow_symlinks  utime

Additionally, we *could* deprecate this function,
|              as I have added this parameter
|              |                to this function:
|              |                |
v              v                v
--------------------------------------
fchdir         fd               chdir
fchmod         fd               chmod
fstat          fd               stat
fstatvfs       fd               statvfs
lchflags       follow_symlinks  chflags
lchmod         follow_symlinks  chmod
fchown         fd               chown
lchown         follow_symlinks  chown
lstat          follow_symlinks  stat

I doubt we'll ever deprecate those functions.
This patch does not deprecate those functions.
I don't propose deprecating those functions.


Notes:

* What do you do on platforms where the functionality isn't available?
  I believe it's better to always accept parameters, but throw a
  NotImplementedError if the functionality they represent is
  unavailable on the current platform.  Adding and removing
  parameters based on the current platform... that way lies madness.
  (It's like scrollbars that pop in and out of existance depending
  on whether or not you need them.  Better that the scrollbars are
  always there, and simply disabled when the content fits in the
  current window.  Users like a stable foundation under their feet.)

* The patch is... pretty big.  But you can divide-and-conquer it
  into a series of self-contained parts.  First I add path_converter,
  then I modify existing functions / remove new functions.  Each of
  those can be reviewed in isolation.

  Also, the new implementations generally follow the same pattern:
    initialize
    call PyArg_ParseTupleAndKeywords
    error out early if user asks for functionality
      unavailable on the current platform
    ensure that combinations of parameters
      (dir_fd, fd, follow_symlinks) are permitted
    do actual work, turning on/off advanced functionality
      based on configure ifdefs (HAVE_FCHOWNAT etc)
    if error, raise exception
    compute return value
    exit:
    cleanup all path variables
    return return_value

  Here's a list of all the functions I added arguments to:
    access chdir chflags chmod chown execve getxattr link listdir
    listxattr mkdir mkfifo mknod open readlink removexattr rename
    setxattr stat statvfs symlink unlink utime

* The one new bit of technology: a PyArg_ParseTuple "converter"
  function called path_converter.  Here's its documentation:

  /*
   * A PyArg_ParseTuple "converter" function
   * that handles filesystem paths in the manner
   * preferred by the os module.
   *
   * path_converter accepts (Unicode) strings and their
   * subclasses, and bytes and their subclasses.  What
   * it does with the argument depends on the platform:
   *
   *   * On Windows, if we get a (Unicode) string we
   *     extract the wchar_t * and return it; if we get
   *     bytes we extract the char * and return that.
   *
   *   * On all other platforms, strings are encoded
   *     to bytes using PyUnicode_FSConverter, then we
   *     extract the char * from the bytes object and
   *     return that.
   *
   * Input fields:
   *   path.nullable
   *     If nonzero, the path is permitted to be None.
   *   path.function_name
   *     If non-NULL, path_converter will use that as the name
   *     of the function in error messages.
   *     (If path.argument_name is NULL it omits the function name.)
   *   path.argument_name
   *     If non-NULL, path_converter will use that as the name
   *     of the parameter in error messages.
   *     (If path.argument_name is NULL it uses "path".)
   *
   * Output fields:
   *   path.wide
   *     Points to the path if it was expressed as Unicode
   *     and was not encoded.  (Only used on Windows.)
   *   path.narrow
   *     Points to the path if it was expressed as bytes,
   *     or it was Unicode and was encoded to bytes.
   *   path.length
   *     The length of the path in characters.
   *   path.object
   *     The original object passed in.
   *   path.cleanup
   *     For internal use only.  May point to a temporary object.
   *     (Pay no attention to the man behind the curtain.)
   *
   *   At most one of path.wide or path.narrow will be non-NULL.
   *   If path was None and path.nullable was set,
   *     both path.wide and path.narrow will be NULL,
   *     and path.length will be 0.
   *   
   *   path_converter takes care to not write to the path_t
   *   unless it's successful.  However it must reset the
   *   "cleanup" field each time it's called.
   *
   * Use as follows:
   *      path_t path;
   *      memset(&path, 0, sizeof(path));
   *      PyArg_ParseTuple(args, "O&", path_converter, &path);
   *      // ... use values from path ...
   *      path_cleanup(&path);
   *
   * (Note that if PyArg_Parse fails you don't need to call
   * path_cleanup().  However it is safe to do so.)
   */
  typedef struct {
      char *function_name;
      char *argument_name;
      int nullable;
      wchar_t *wide;
      char *narrow;
      Py_ssize_t length;
      PyObject *object;
      PyObject *cleanup;
  } path_t;

  I assert path_converter is Very Useful.  It nearly always
  reduced the argument processing from three code paths
  (two for Windows, one for everyone else) to one.  Even if
  the rest of the patch isn't accepted I'm sure we'll keep
  path_converter.

* In a lot of places I combined together several functions, or
  several large blobs of #ifdef'd code, into one large
  slightly-shaggy function.  Like, Windows code is now often
  streamlined in, instead of being a big separate #ifdef.  And
  instead of three passes at decoding arguments (one for Windows
  wide, one for Windows narrow, one for everyone else) there is
  generally just one.  (Thanks, path_converter!)  Again, even
  if we don't keep the extra keyword arguments / functionality,
  I'm hoping at least some of that cleanup will survive.

* utime is the whale.  It obviates *five* other functions
  (futimens, futimes, futimesat, lutimes, and utimensat).
  I spent a fair amount of time refactoring it.  It's so complex,
  it was hard to make it anything like readable.  I like to
  think I reached a local maximum of readability given what I had
  to work with, and I like to think that the resulting dog's
  breakfast is less unreadable than the previous iteration.  Anyway
  I'm definitely open to suggestions on how to restructure it
  further to enhance readability.

* The tests used to use e.g. hasattr(os, 'futimes') and the like to
  determine what functionality was available.  In order to LBYL,
  I had to resort to e.g. sysconfig.get_config_var('HAVE_FUTIMES').
  If users need to LBYL here too, well, it's kind of a problem.
  (I hope function signature objects solve this--we're talking
  about it.)

* On a very related topic: os.openat() is one of the functions I
  removed.  And then we have this in Lib/os.py:
      if _exists('openat'):
          def fwalk(...)
  Since I removed os.openat completely, we could longer tell
  whether or not the functionality is available.  And I couldn't
  use my sysconfig.get_config_var trick because it's os.py and
  we're bootstrapping the interpreter.  So I went with a dirty
  hack--uh, I mean, practicality beats purity.  posixmodule.c
  conditionally exposes a symbol called _HAVE_OPENAT if openat()
  is available, and fwalk() is now gated on that.

* A minor semantic change: symlink() now accepts the same arguments
  everywhere.  Previously it had an extra optional argument only on
  Windows ("target_is_directory").  Non-Windows now accepts that too
  and ignores it, in much the same way that os.mkdir ignores the mode
  on Windows.  Also, os.symlink now accepts byte strings for the
  paths on Windows.  (Even though we'll remove byte string paths
  in 3.4, yes?)

* The patch is still sloppy:
    * The docstrings are incomplete.
    * I haven't touched the docs at all.
    * There are > 80 col lines.
  Clearly I'll clean up this stuff if the patch has a shot at
  going in.

I do think it's an improvement.  But I won't check any part of it
in without some consensus (or BFDL ruling).

I look forward to your feedback!

----------
assignee:  -> larry
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file25716/larry.os.keyword.arguments.collapse.1.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14626>
_______________________________________


More information about the Python-bugs-list mailing list