[SciPy-User] Proposal for a new data analysis toolbox

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Nov 24 08:28:19 EST 2010


On Wed, Nov 24, 2010 at 7:43 AM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Wed, Nov 24, 2010 at 6:54 AM,  <josef.pktd at gmail.com> wrote:
>> On Wed, Nov 24, 2010 at 2:56 AM, Dag Sverre Seljebotn
>> <dagss at student.matnat.uio.no> wrote:
>>> On 11/23/2010 10:17 PM, Keith Goodman wrote:
>>>> On Tue, Nov 23, 2010 at 1:09 PM, Sebastian Haase<seb.haase at gmail.com>  wrote:
>>>>
>>>>> On Tue, Nov 23, 2010 at 8:23 PM, Keith Goodman<kwgoodman at gmail.com>  wrote:
>>>>>
>>>>>> On Mon, Nov 22, 2010 at 7:35 AM, Keith Goodman<kwgoodman at gmail.com>  wrote:
>>>>>>
>>>>>>> This thread started on the numpy list:
>>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053958.html
>>>>>>>
>>>>>> Based on the feedback I got on the scipy and numpy lists, I expanded
>>>>>> the focus of the Nanny project from A to B, where
>>>>>>
>>>>>> A = Faster, drop-in replacement of the NaN functions in Numpy and Scipy
>>>>>> B = Fast, NaN-aware descriptive statistics of NumPy arrays
>>>>>>
>>>>>> I also renamed the project from Nanny to dsna (descriptive statistics
>>>>>> of numpy arrays) and dropped the nan prefix from all function names
>>>>>> (the package is simpler if all functions are NaN aware). A description
>>>>>> of the project can be found in the readme file here:
>>>>>>
>>>>>> http://github.com/kwgoodman/dsna
>>>>>>
>>>>> Nanny did have the advantage of being "catchy" - and easy to remember... !
>>>>> no chance of remembering a 4 ("random") letter sequence....
>>>>> If you want to change the name, I suggest including the idea of
>>>>> speed/cython/.. or so -- wasn't that the original idea ....
>>>>>
>>>> I couldn't come up with anything. I actually named the project STAT
>>>> but then couldn't import ipython because python has a stat module.
>>>> Ugh. I'd like a better name so I am open to suggestions. Even an
>>>> unrelated word would be good, you know, like Maple.
>>>>
>>>
>>> This feels like the kind of functionality that, once it is there, people
>>> might start to take for granted. In those cases I think finding a boring
>>> name is proper :-)
>>>
>>> So how about something boring under the scikits namespace.
>>> scikits.datautils, scikits.arraystats, ...
>>>
>>> If one wants to be cute, perhaps "scikits.missing", for functions that
>>> deal well with missing data (unless I misunderstand, I don't use NaN
>>> much myself).
>>>
>>> I guess "Missing" by itself would be rather un-Googlable :-)
>>
>> I think having a good name for search engines makes a name more practical
>> (compare a search for statsmodels with a search for pandas or larry)
>>
>> "nanstats" only shows similar programs to what this will be
>> "nandata" doesn't seem to be used yet
>>
>> "nanpy" looks like a worm
>> "pynan" "pynans" google thinks its a misspelling
>>
>> I like boring and descriptive
>>
>> Josef
>>
>>>
>>> Dag Sverre
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> Totally missed this thread last couple days!
>
> +1 for a boring name, like datalib, datautils, etc. How about datalib,
> name conflict with another python project? I also wouldn't want the
> name to be too narrowly focused (e.g. names with "nan" in them)
> because it's not really about NaN-- it's about having the tools you
> need to work with any kind of data.

or datatools

datalib would indicate c compiled code, but google thinks it refers to
data libraries, archives, collections
datautils and datatools just get a bit of competition from java

but adding py in front is tooo boring.

I also agree with Wes that over time an expanding set of utility
functions for data handling/analysis can be added, rather than
restrict to nan-aware descriptive statistics.

Josef

>
> I am not for placing arbitrary restrictions or having a strict
> enumeration on what goes in this library. I think having a practical,
> central dumping ground for data analysis tools would be beneficial. We
> could decide about having "spin-off" libraries later if we think
> that's appropriate.
>
> - Wes
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list