[SciPy-User] Proposal for a new data analysis toolbox

Wes McKinney wesmckinn at gmail.com
Wed Nov 24 07:43:08 EST 2010


On Wed, Nov 24, 2010 at 6:54 AM,  <josef.pktd at gmail.com> wrote:
> On Wed, Nov 24, 2010 at 2:56 AM, Dag Sverre Seljebotn
> <dagss at student.matnat.uio.no> wrote:
>> On 11/23/2010 10:17 PM, Keith Goodman wrote:
>>> On Tue, Nov 23, 2010 at 1:09 PM, Sebastian Haase<seb.haase at gmail.com>  wrote:
>>>
>>>> On Tue, Nov 23, 2010 at 8:23 PM, Keith Goodman<kwgoodman at gmail.com>  wrote:
>>>>
>>>>> On Mon, Nov 22, 2010 at 7:35 AM, Keith Goodman<kwgoodman at gmail.com>  wrote:
>>>>>
>>>>>> This thread started on the numpy list:
>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>>>>>>
>>>>> Based on the feedback I got on the scipy and numpy lists, I expanded
>>>>> the focus of the Nanny project from A to B, where
>>>>>
>>>>> A = Faster, drop-in replacement of the NaN functions in Numpy and Scipy
>>>>> B = Fast, NaN-aware descriptive statistics of NumPy arrays
>>>>>
>>>>> I also renamed the project from Nanny to dsna (descriptive statistics
>>>>> of numpy arrays) and dropped the nan prefix from all function names
>>>>> (the package is simpler if all functions are NaN aware). A description
>>>>> of the project can be found in the readme file here:
>>>>>
>>>>> http://github.com/kwgoodman/dsna
>>>>>
>>>> Nanny did have the advantage of being "catchy" - and easy to remember... !
>>>> no chance of remembering a 4 ("random") letter sequence....
>>>> If you want to change the name, I suggest including the idea of
>>>> speed/cython/.. or so -- wasn't that the original idea ....
>>>>
>>> I couldn't come up with anything. I actually named the project STAT
>>> but then couldn't import ipython because python has a stat module.
>>> Ugh. I'd like a better name so I am open to suggestions. Even an
>>> unrelated word would be good, you know, like Maple.
>>>
>>
>> This feels like the kind of functionality that, once it is there, people
>> might start to take for granted. In those cases I think finding a boring
>> name is proper :-)
>>
>> So how about something boring under the scikits namespace.
>> scikits.datautils, scikits.arraystats, ...
>>
>> If one wants to be cute, perhaps "scikits.missing", for functions that
>> deal well with missing data (unless I misunderstand, I don't use NaN
>> much myself).
>>
>> I guess "Missing" by itself would be rather un-Googlable :-)
>
> I think having a good name for search engines makes a name more practical
> (compare a search for statsmodels with a search for pandas or larry)
>
> "nanstats" only shows similar programs to what this will be
> "nandata" doesn't seem to be used yet
>
> "nanpy" looks like a worm
> "pynan" "pynans" google thinks its a misspelling
>
> I like boring and descriptive
>
> Josef
>
>>
>> Dag Sverre
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Totally missed this thread last couple days!

+1 for a boring name, like datalib, datautils, etc. How about datalib,
name conflict with another python project? I also wouldn't want the
name to be too narrowly focused (e.g. names with "nan" in them)
because it's not really about NaN-- it's about having the tools you
need to work with any kind of data.

I am not for placing arbitrary restrictions or having a strict
enumeration on what goes in this library. I think having a practical,
central dumping ground for data analysis tools would be beneficial. We
could decide about having "spin-off" libraries later if we think
that's appropriate.

- Wes



More information about the SciPy-User mailing list