[Python-ideas] Making the stdlib consistent again

Mon Jul 25 13:55:27 EDT 2016

Hi python-ideas,

As you all know, the Python stdlib can sometimes be a bit of an
inconsistent mess that can be surprising in how it names things. This is
mostly caused by the fact that several modules were developed before the
introduction of PEP-8, and now we're stuck with the older naming within
these modules.

It has been said and discussed in the past [1][2] that the stdlib is in
fact inconsistent, but fixing this has almost always been disregarded as
being too painful (after all, we don't want a new Python 3 all over again).
However, this way, we will never move away from these inconsistencies.
Perhaps this is fine, but I think we should at least consider providing
function and class names that are unsurprising for developers.

While maintaining full backwards compatibility, my idea is that we should
offer consistently named aliases in -eventually- all stdlib modules. For
instance, with Python 2.6, the threading module received this treatment,
but unfortunately this was not expanded to all modules.

What am I speaking of precisely? I have done a quick survey of the stdlib
and found the following examples. Please note, this is a highly opinionated
list; some names may have been chosen with a very good reason, and others
are just a matter of taste. Hopefully you agree with at least some of them:

  * The CamelCasing in some modules are the most obvious culprits, e.g.
logging and unittest. There is obviously an issue regarding subclasses and
methods that are supposed to be overridden, but I feel we could make it
work.

  * All lower case class names, such as collections.defaultdict and
collections.deque, should be CamelCased. Another example is datetime, which
uses names such as timedelta instead of TimeDelta.

  * Inconsistent names all together, such as re.sub, which I feel should be
re.replace (cf. str.replace). But also re.finditer and re.findall, but no
re.find.

  * Names that do not reflect actual usage, such as ssl.PROTOCOL_SSLv23,
which can in fact not be used as client for SSLv2.

  * Underscore usage, such as tarfile.TarFile.gettarinfo (should it not be
get_tar_info?), http.client.HTTPConnection.getresponse vs set_debuglevel,
and pathlib.Path.samefile vs pathlib.Path.read_text. And is it
pkgutil.iter_modules or is it pathlib.Path.iterdir (or re.finditer)?

  * Usage of various abbreviations, such as in filecmp.cmp

  * Inconsistencies between similar modules, e.g. between
tarfile.TarFile.add and zipfile.ZipFile.write.

These are just some examples of inconsistent and surprising naming I could
find, other categories are probably also conceivable. Another subject for
reconsideration would be attribute and argument names, but I haven't looked
for those in my quick survey.

For all of these inconsistencies, I think we should make a 'consistently'
named alternative, and alias the original variant with them (or the other
way around), without defining a deprecation timeline for the original
names. This should make it possible to eventually make the stdlib
consistent, Pythonic and unsurprising.

What would you think of such an effort?

Regards,
Ralph Broenink

 [1] https://mail.python.org/pipermail/python-ideas/2010-January/006755.html
 [2] https://mail.python.org/pipermail/python-dev/2009-March/086646.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160725/f475b590/attachment-0001.html>