[IPython-dev] Integrating pandas into pylab

Brian Granger ellisonbg at gmail.com
Wed Oct 26 16:36:01 EDT 2011


Satra,

On Wed, Oct 26, 2011 at 10:36 AM, Satrajit Ghosh <satra at mit.edu> wrote:
> hi all,
>
>>
>> > That said, the major problem is that the current organization of the
>> > major packages is not logical or intuitive.  numpy has arrays,
>> > algorithms and IO, scipy has algorithms and IO, matplotlib has
>> > plotting and algorithms and IO, pandas has datastructures, IO,
>> > algorithms and plotting (albeit all organized around the dataframe).
>> > And so on.  I think there is room for a namespace package that
>> > integrates across these and makes it more intuitive.  The proper top
>> > level namespaces are something like: array (or statstructures more
>> > generally), algo, plot, io.  In this model, you would pull the
>> > relevant components from numpy, scipy, mpl, pandas, scikits, ETS, etc
>> > into the relevant namespaces.
>>
>> I would not want to make the new namespace nested - it should be flat
>> like the current pylab is.  I think the main point here is that we are
>> competing with Matlab, Mathematica, etc. which all have completely
>> flat namespaces.  Not that we want to copy everything these packages
>> do, but for new users, non-technical folks, undergraduates the entire
>> idea of namespaces is confusing.  This problem (for these users) is
>> namspaces themselves, not just that the existing ones are confusing.
>
> i don't think what confuses new users is whether the namespace is flat or
> nested, but where to get the functionality that they are looking for and if
> in the nested namespace there are multiple implementations of the
> functionality, which one to use.

I think we are talking about different sets of users:

* I am referring to *complete newbies* that have 0 programming
experience, barely know what a for loop is and are struggling to
figure out how to index a numpy array.  Examples would be your typical
1st-3rd year undergraduate student in engineering, physics, chemistry,
biology, etc.  I spend my days surrounded by these type of users (I
teach undergraduate university physics) and it is completely
overwhelming for them to learn a new programming language along with a
set of libraries for numerical work.  Asking them to track which
functions come from which namespaces and get the imports right for
each of them each time they write a new script is simply too great a
cognitive load.  These users need a flat namespace  they can import *
from and, as you point out, a convenient way of searching that
namespace to find what they need.

* You are picturing users that have extensive technical backgrounds,
likely grad students, postdocs or beyond who are new to Python or
numpy/scipy/matplotlib but not new to programming, text editors,
command prompt, etc.  For these users I completely agree with your
analysis.  Asking these users to understand and manage namespaces is
not much of a problem and it is mostly an issue of helping them to
find what they are looking for.

> even in matlab, which has a flat namespace, users need to do 'lookfor' to
> search for available functions. and in cases where additional toolboxes are
> installed which may have the same function filename, it's a nightmare.
> so i think the notion of namespace is crucial in the scientific computing
> world and the import statement is not the biggest block that new users face.
> most often, it is: 'which function solves my problem?'. for example, *once*
> you know 'plot', 'hist' and 'imshow' are in pyplot, it's really trivial for
> most users to type:
> ---
> import matplotlib.pyplot as plt
> or
> from matplotlib.pyplot import plot, hist, imshow
> ---

For your "advanced" new users I agree, but surely not for the class of
new users I am thinking about.

> i agree with john that having namespaces like:
> stats
> io
> plot
> linalg
> might actually be useful in categorizing and looking for functionality,
> which i believe was the one of the intents of namespaces in the first place.

I agree.

> if the goal is to provide matlab compatibility and assume that function
> names and their calling structures are identical, then the flat namespace
> helps. however, beyond some routines, this is impossible to maintain.

> i think scientific computing in python would do very well if some mechanism
> like 'lookfor' provided users (especially new users) a good way to find
> functionality. i don't think the flat namespace solves this problem.

I agree that the lookfor functionality is needed regardless of what
happens with the namespaces and could
be implemented equally well for flat or nested namespaces.

In summary, there are three mostly-orthogonal issues:

1) The need for the flat namespace for truly beginning users.
2) The need to consolidate/organize the existing messy and overlapping
namespaces of numpy/scipy/matplotlib.
3) The need for lookfor type functionality for all users.

It is possible that 1 and 2 could be solved in a single package.
There could be well organized io, linalg, algo, etc submodules and an
"all" module that imports everything for users that need/want it.

Cheers,

Brian


> cheers,
> satra
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>



-- 
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com



More information about the IPython-dev mailing list