Re: [SciPy-dev] Namespaces in documentation

I object to any abbreviations *at all* in the docs. This post lays out my argument against abbreviations, gives two alternatives, and proposes a poll to resolve the issue. There is a comment on numpy API maturity at the end. I'm sorry that it's long. I think we're making an important step with this decision. We need to make sure that it's the right one, and that the community buys that it's the right one so that the issue stays resolved. What we are writing is the formal documentation to a software package, not a bunch of informal recipes. How we code the doc examples is correctly perceived as how we recommend everyone write their own code. Is there any other place in Python (or indeed in computer science) where a package advocates referring to itself by something other than its own name, and documents itself that way? Certainly such cases are few. Doing so is a loud community declaration that the package authors made a serious mistake in calling it something the community can't actually tolerate using, even in writing formal documentation, where a few extra characters are not a big fraction of the effort. I don't like to think that this is the case, but more on that later. Aside from the embarrassment of declaring a mistake on every page of documentation, the proposed path makes the documentation not work for several classes of users who should by all rights have it work for them. These alienated users have more primary rights than those for whom abbreviations are conveniences, which admittedly is most of us (including me). They include: 1. Those who were using np or sp or plt as variables. These are *not* reserved words, and we have seen that a substantial number of people use them. It is their legal *right* to do so, and to expect that everything, including the examples in the docs, will work for them when they do! The alias import would wipe out their legitimate variables. 2. Those not using iPython, for whatever reason. Numpy and scipy are python packages, not iPython packages (I don't think there are any pure iPython packages). We must always support using the packages, and their docs, in native and unpolluted python sessions, because all code is compiled and run by python. Someone using python natively because they want to be as close to the metal as possible should not have to sacrifice access to any part of the docs to do that. If they need to import numpy some other way because their code depends on it, they can't run the doc examples. (Don't get me wrong, iPython is great, and I use it). There is a third problem that could come later. Currently, it is likely the case that no other Python package is called "np" or "sp". Certainly no future package will be called "numpy", but can we be sure no package will call itself "np"? I don't think we can guarrantee that if it's just a convenient abbreviation for us. If they do, the abbreviation we choose now will conflict with that package in the future, and the other package would win because convenience abbreviations are distant second-class citizens to package names (and likely because numpy is a rather niche use and has a correspondingly small representation among all Python users). We'd then have to change the docs, and retrain the entire community from our entrenched use habit to something else (and reopen the debate as to what that name would be). Whoever in our community needed the new np package would have to rewrite old code to get it to work, if they followed the suggested convention. The behavior we are engaging in is arranging a convenience for the majority without regard for the minority. Producers do this all the time and it invariably ends up having negative repercussions for some. Consider how little different these sound: "run in ipython" "use these abbreviations" "only use xxx browser" "no support for Linux/Mac/whatever" "the only supported client is Outlook" Each of these decisions loses only a small minority, but do it enough times and our remaining "majority" looks small indeed. These expedience decisions invariably restrict flexibility and alienate users, most of whom will never be known to us. We must therefore avoid making such decisions wherever possible, and it *is* possible in this case. That we are nonetheless doing it so that a dozen people writing docs don't have to type just three more characters per function call is baffling to me. These are not lazy people. So, I'm being driven to accept something deeper, that a substantial fraction of the community *is* saying that typing "numpy." is unacceptable, even in the formal documentation, even in the numpy code itself, not out of laziness but because it pollutes the code, or their thinking about the code as they write it. If this is true, it is a declaration that the package is either misnamed or that our use recommendation is drastically wrong, and we need to change accordingly. So here are some possibilities: 1. Let's take the simpler case first, the idea that typing "numpy." is unacceptable, but that "np." might be ok. Then, the package really *should* have been called "np" and not "numpy". If this is really the case, then we should recognize that the project is still in its infancy, and that we should fix the problem in release 2.0 by calling the package "np" (still NumPy when written in text), and by its 1.0 release call SciPy "sp". For backward compatibility, we should reserve the words "numpy" and "scipy", which nobody now uses as variables, so that people can do import np as numpy import sp as scipy at the top of their old code. This is a radical suggestion; see below for a coment on radical API changes. I do not advocate this suggestion, but I think you must accept it (or a variant) if you are simply unwilling to type "numpy.". Call it the "modest proposal", with apologies to Jonathan Swift. 2. Alternatively, we may accept that typing *anything* before the function call name is at best pollution and possibly also obfuscation. In what other programming language do you say y = weij.sin(x) instead of y = sin(x) ? None I can think of, and many people find it particularly onerous in interactive mode. So, why not be linguists, and recognize that how people actually want to use a language *is* its grammar? We should thus encourage from numpy import xxx yyy zzz for all functions in a given chunk of code. This would allow direct references to the function calls without a name in front of them. For the docs, that would mean putting that line at the top of the examples section of each docstring (to combat the disdained "import *"). This would make the code in the examples look a lot cleaner and clearer. It would simplify our personal code, too. This is the solution I favor. N1. I am absolutely opposed to any solution that *implies* a peculiar import, such as that if numpy.fft.fft is in the top-level namespace, then so is numpy.fft.rfft. New users will be lost with this assumption since importing and namespaces (especially with abbreviations in scope) are not familiar to them. They will then not be able to run the doc examples they need in order to learn the package. N2. I also oppose any solution that encourages one usage interactively and another in code. While anyone may choose such a path for themselves, and there may be good reasons for doing so, it is an unacceptable extra layer of complication for new users for us to be advocating such an approach. The decision we are making in this debate is a big one. The examples in the doc *are how we recommend people code*. Since that affects everyone and not just code in numpy itself, the decision should be made by the community as a whole. I propose the following: We discuss this through Wednesday 4 June. On Thursday we put up a poll, keep it open through Wednesday 11 June, and stick with the results. The poll should declare the project's recommendation for how people code, and that recommendation should be reflected in the docs. Since there is a group that might need protection, and since I know many of us are comfortable with more than one solution, I propose this poll: Q1: True or False: I have a significant code investment in the variable(s) 'np', 'sp', or 'plt'; use of one or more of these as an abbreviation or package name would hurt my work seriously. Q2: Distribute points among the following. These will be normalized per person and tallied per option. The option with the most cumulative points wins. Entries not consisting solely of numbers are treated as 0: 1. Keep the names "numpy" and "scipy", use no abbreviations, recommend full names, as in: import numpy y = numpy.sin(x) 2. Keep the names "numpy" and "scipy", use "np" and "sp" abbreviations, as in: import numpy as np y = np.sin(x) 3. Keep the names "numpy" and "scipy", use no abbreviations, recommend explicit imports, as in: from numpy import sin y = sin(x) 4. Change the names "numpy" and "scipy" to "np" and "sp" in their next major releases, protect the words "numpy" and "scipy" for backward compatibility, as in: import np y = np.sin(x) 5. Change the names "numpy" and "scipy" to "np" and "sp" in their next major releases, protect the words "numpy" and "scipy" for backward compatibility, AND recomend explicit imports, as in: from np import sin y = sin(x) 6. Keep the names "numpy" and "scipy", use no abbreviations, recommend import into the top-level namespace, as in: from numpy import * y = sin(x) As a final note, I do recognize that the idea of renaming numpy and scipy permanently is a radical API break. *I am not endorsing this idea.* However, a number of other API breaks have been proposed recently, including the sensible change to the behavior of median(), the de-facto proposal to make 'np', 'sp', and 'plt' retroactively reserved words in the top-level Python namespace, matrix/ufunc/boolean/ma behavior, etc. This indicates to me that numpy and scipy are not close to maturity. The lack of reference and user documentation is another indicator that we are not close to maturity. We need to reach that mature stage soon, so that people can depend on their code investments in numpy. This will require a full community review that has not yet occurred, an agreement on the final API, and then a hard commitment against incompatible changes. Our commercial competitors have gained their large followings in large part because of their API stability over several decades. If we are to compete, we will need to shake out the needed changes, freeze the API, and formally commit against incompatible change to the extent humanly possible. I think we start that with numpy and I can see it completing about 2-3 years from now (including getting a real user manual written). How we do it with scipy is a different story but I can see it happening piecemeal easier than all at once. --jh--

On Tue, Jun 3, 2008 at 9:11 AM, Joe Harrington <jh@physics.ucf.edu> wrote:
Aside from the embarrassment of declaring a mistake on every page of documentation, the proposed path makes the documentation not work for several classes of users who should by all rights have it work for them. These alienated users have more primary rights than those for whom abbreviations are conveniences, which admittedly is most of us (including me). They include:
This is not an issue of having made a mistake that we are trying to hide by using an alias. It is an issue of trying to strike the right balance between readability and usability. Many of us write a lot of code in three different environments: scripts, production libraries, and interactively from the python shell, and find that the recommended import semantics strike the right balance. Ideally, we would like one solution that works pretty well in all three context so we don't have to do the mental context switching: "I'm scripting now so I should do this, I'm at the python shell now so I should do that, I'm working on mpl src now so do something else" Interactively from the python shell, it is easiest to type >>> sin(2*pi*t)*exp(-t) so it is nice to first do >>> from numpy import sin, pi, exp But this idiom doesn't work very well for large pieces of production code. We did this in matplotlib for years, and were constantly moving back and forth from the import section to the code section to find out if a symbol is already imported. It is very nice when writing large pieces of code to know that some name is available, so we can do x = numpy.arange(10.) y = numpy.sin(2*numpy.pi*x) * numpy.exp(x) w/o having to go looking at the import layer. That is manageable, but a bit ugly, which is why most people prefer an alias, and these aliases are *widely* used throughout the entire code base of numpy, scipy and matplotlib. This is not an embarrassment or an admission of a mistake, it is taking advantage of a language feature benevolently bestowed upon us by Guido. The nice thing about the import numpy as np alias is that it is sufficiently short that you can use it in scripts and interactive sessions, and sufficiently mnemonic that you can use it in production code. It is also easy to type, as are the other recommended aliases. Of course you are right that we run the risk of clashes with other people who will be using np as a variable name or another package (a little unlikely on the latter since it would be pretty foolish to name a package with two letters for precisely the reason that it is likely to clash). Recognizing that aliases are a good thing, and any alias will clash sometimes, several of us got together (including lead developers of ipython, numpy, scipy and matplotlib) and decided to recommend a usage, precisely to lower the risk of clashes. If we are consistent in the usage in the docs and code, and publicly recommend it, then clashes are less likely. As for those worried about an ipython requirement, there isn't one. plain-ol-python supports import configuration as well, so a python shell properly configured for numpy/scipy will support pasting from the docs anyhow. http://docs.python.org/tut/node4.html#SECTION004240000000000000000 Finally, whatever standard you adopt for the docs, a decision has already been made by the ipython, numpy, scipy and matplotlib developers to use this alias convention in their source code. Since many users eventually become developers, it is nice to have a consistent approach between what is advocated the user documentation and the source code, where feasible. JDH

On Tue, Jun 3, 2008 at 9:11 AM, Joe Harrington <jh@physics.ucf.edu> wrote:
Is there any other place in Python (or indeed in computer science) where a package advocates referring to itself by something other than
From Bjarne Stroustrup "Programming Language C++, 3rd edition" Section 8,2 "Namespaces"::
If users give their namespaces short names, the names of different namespaces will clash: namespace A { //short name, will clash (eventually) However, long namespaces can be impractical in real code: namespace American_Telephone_and_Telegraph { //too long This dilemma can be resolved by providing a short alias for a longer namespace name //use namespace alias to shorten names: namespace ATT = American_Telephone_and_Telegraph; ATT:String s3 = "Grieg"; Namespace aliases allow a user to refer to "the library" and have a single declaration defining what library that really is. For example: namespace Lib = Foundation_library_v2r11; Well said. Because they are so handy, most languages provide facilities for aliases:: python : import something as somethingelse C++ : namespace new_name = current_name; Bash : alias ls='ls -F' C : # DEFINE C# : using colAlias = System.Collections; Texinfo : @alias new=existing' Perl : use perl Alias And we needn't look too far beyond our own doors, since there is broad agreement here on the usefulness of namespace aliases. JDH
participants (2)
-
Joe Harrington
-
John Hunter