I object to any abbreviations *at all* in the docs. This post lays
out my argument against abbreviations, gives two alternatives, and
proposes a poll to resolve the issue. There is a comment on numpy API
maturity at the end. I'm sorry that it's long. I think we're making
an important step with this decision. We need to make sure that it's
the right one, and that the community buys that it's the right one so
that the issue stays resolved.
What we are writing is the formal documentation to a software package,
not a bunch of informal recipes. How we code the doc examples is
correctly perceived as how we recommend everyone write their own code.
Is there any other place in Python (or indeed in computer science)
where a package advocates referring to itself by something other than
its own name, and documents itself that way? Certainly such cases are
few. Doing so is a loud community declaration that the package
authors made a serious mistake in calling it something the community
can't actually tolerate using, even in writing formal documentation,
where a few extra characters are not a big fraction of the effort. I
don't like to think that this is the case, but more on that later.
Aside from the embarrassment of declaring a mistake on every page of
documentation, the proposed path makes the documentation not work for
several classes of users who should by all rights have it work for
them. These alienated users have more primary rights than those for
whom abbreviations are conveniences, which admittedly is most of us
(including me). They include:
1. Those who were using np or sp or plt as variables. These are *not*
reserved words, and we have seen that a substantial number of people
use them. It is their legal *right* to do so, and to expect that
everything, including the examples in the docs, will work for them
when they do! The alias import would wipe out their legitimate
variables.
2. Those not using iPython, for whatever reason. Numpy and scipy are
python packages, not iPython packages (I don't think there are any
pure iPython packages). We must always support using the packages,
and their docs, in native and unpolluted python sessions, because all
code is compiled and run by python. Someone using python natively
because they want to be as close to the metal as possible should not
have to sacrifice access to any part of the docs to do that. If they
need to import numpy some other way because their code depends on it,
they can't run the doc examples. (Don't get me wrong, iPython is
great, and I use it).
There is a third problem that could come later. Currently, it is
likely the case that no other Python package is called "np" or "sp".
Certainly no future package will be called "numpy", but can we be sure
no package will call itself "np"? I don't think we can guarrantee
that if it's just a convenient abbreviation for us. If they do, the
abbreviation we choose now will conflict with that package in the
future, and the other package would win because convenience
abbreviations are distant second-class citizens to package names (and
likely because numpy is a rather niche use and has a correspondingly
small representation among all Python users). We'd then have to
change the docs, and retrain the entire community from our entrenched
use habit to something else (and reopen the debate as to what that
name would be). Whoever in our community needed the new np package
would have to rewrite old code to get it to work, if they followed the
suggested convention.
The behavior we are engaging in is arranging a convenience for the
majority without regard for the minority. Producers do this all the
time and it invariably ends up having negative repercussions for
some. Consider how little different these sound:
"run in ipython"
"use these abbreviations"
"only use xxx browser"
"no support for Linux/Mac/whatever"
"the only supported client is Outlook"
Each of these decisions loses only a small minority, but do it enough
times and our remaining "majority" looks small indeed. These
expedience decisions invariably restrict flexibility and alienate
users, most of whom will never be known to us. We must therefore
avoid making such decisions wherever possible, and it *is* possible in
this case.
That we are nonetheless doing it so that a dozen people writing docs
don't have to type just three more characters per function call is
baffling to me. These are not lazy people. So, I'm being driven to
accept something deeper, that a substantial fraction of the community
*is* saying that typing "numpy." is unacceptable, even in the formal
documentation, even in the numpy code itself, not out of laziness but
because it pollutes the code, or their thinking about the code as they
write it. If this is true, it is a declaration that the package is
either misnamed or that our use recommendation is drastically wrong,
and we need to change accordingly. So here are some possibilities:
1. Let's take the simpler case first, the idea that typing "numpy." is
unacceptable, but that "np." might be ok. Then, the package really
*should* have been called "np" and not "numpy". If this is really the
case, then we should recognize that the project is still in its
infancy, and that we should fix the problem in release 2.0 by calling
the package "np" (still NumPy when written in text), and by its 1.0
release call SciPy "sp". For backward compatibility, we should
reserve the words "numpy" and "scipy", which nobody now uses as
variables, so that people can do
import np as numpy
import sp as scipy
at the top of their old code. This is a radical suggestion; see below
for a coment on radical API changes. I do not advocate this
suggestion, but I think you must accept it (or a variant) if you are
simply unwilling to type "numpy.". Call it the "modest proposal",
with apologies to Jonathan Swift.
2. Alternatively, we may accept that typing *anything* before the
function call name is at best pollution and possibly also obfuscation.
In what other programming language do you say
y = weij.sin(x)
instead of
y = sin(x)
? None I can think of, and many people find it particularly onerous
in interactive mode. So, why not be linguists, and recognize that how
people actually want to use a language *is* its grammar? We should
thus encourage
from numpy import xxx yyy zzz
for all functions in a given chunk of code. This would allow direct
references to the function calls without a name in front of them. For
the docs, that would mean putting that line at the top of the examples
section of each docstring (to combat the disdained "import *"). This
would make the code in the examples look a lot cleaner and clearer.
It would simplify our personal code, too. This is the solution I
favor.
N1. I am absolutely opposed to any solution that *implies* a peculiar
import, such as that if numpy.fft.fft is in the top-level namespace,
then so is numpy.fft.rfft. New users will be lost with this
assumption since importing and namespaces (especially with
abbreviations in scope) are not familiar to them. They will then not
be able to run the doc examples they need in order to learn the
package.
N2. I also oppose any solution that encourages one usage interactively
and another in code. While anyone may choose such a path for
themselves, and there may be good reasons for doing so, it is an
unacceptable extra layer of complication for new users for us to be
advocating such an approach.
The decision we are making in this debate is a big one. The examples
in the doc *are how we recommend people code*. Since that affects
everyone and not just code in numpy itself, the decision should be
made by the community as a whole. I propose the following:
We discuss this through Wednesday 4 June. On Thursday we put up a
poll, keep it open through Wednesday 11 June, and stick with the
results. The poll should declare the project's recommendation for how
people code, and that recommendation should be reflected in the docs.
Since there is a group that might need protection, and since I know
many of us are comfortable with more than one solution, I propose this
poll:
Q1: True or False:
I have a significant code investment in the variable(s) 'np', 'sp', or
'plt'; use of one or more of these as an abbreviation or package name
would hurt my work seriously.
Q2: Distribute points among the following. These will be normalized
per person and tallied per option. The option with the most
cumulative points wins. Entries not consisting solely of numbers are
treated as 0:
1. Keep the names "numpy" and "scipy", use no abbreviations, recommend
full names, as in:
import numpy
y = numpy.sin(x)
2. Keep the names "numpy" and "scipy", use "np" and "sp"
abbreviations, as in:
import numpy as np
y = np.sin(x)
3. Keep the names "numpy" and "scipy", use no abbreviations, recommend
explicit imports, as in:
from numpy import sin
y = sin(x)
4. Change the names "numpy" and "scipy" to "np" and "sp" in their next
major releases, protect the words "numpy" and "scipy" for backward
compatibility, as in:
import np
y = np.sin(x)
5. Change the names "numpy" and "scipy" to "np" and "sp" in their next
major releases, protect the words "numpy" and "scipy" for backward
compatibility, AND recomend explicit imports, as in:
from np import sin
y = sin(x)
6. Keep the names "numpy" and "scipy", use no abbreviations, recommend
import into the top-level namespace, as in:
from numpy import *
y = sin(x)
As a final note, I do recognize that the idea of renaming numpy and
scipy permanently is a radical API break. *I am not endorsing this
idea.* However, a number of other API breaks have been proposed
recently, including the sensible change to the behavior of median(),
the de-facto proposal to make 'np', 'sp', and 'plt' retroactively
reserved words in the top-level Python namespace,
matrix/ufunc/boolean/ma behavior, etc. This indicates to me that
numpy and scipy are not close to maturity. The lack of reference and
user documentation is another indicator that we are not close to
maturity.
We need to reach that mature stage soon, so that people can depend on
their code investments in numpy. This will require a full community
review that has not yet occurred, an agreement on the final API, and
then a hard commitment against incompatible changes.
Our commercial competitors have gained their large followings in large
part because of their API stability over several decades. If we are
to compete, we will need to shake out the needed changes, freeze the
API, and formally commit against incompatible change to the extent
humanly possible. I think we start that with numpy and I can see it
completing about 2-3 years from now (including getting a real user
manual written). How we do it with scipy is a different story but I
can see it happening piecemeal easier than all at once.
--jh--