[Numpy-discussion] numpy 1.7.0 release?

Wed Dec 7 14:45:28 EST 2011

On Tue, Dec 6, 2011 at 4:13 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Tue, Dec 6, 2011 at 4:11 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
>>
>>
>> On Mon, Dec 5, 2011 at 8:43 PM, Ralf Gommers <ralf.gommers at googlemail.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> It's been a little over 6 months since the release of 1.6.0 and the NA
>>> debate has quieted down, so I'd like to ask your opinion on the timing of
>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small
>>> improvements, plus three larger chucks of work:
>>>
>>> - datetime
>>> - NA
>>> - Bento support
>>>
>>> My impression is that both datetime and NA are releasable, but should be
>>> labeled "tech preview" or something similar, because they may still see
>>> significant changes. Please correct me if I'm wrong.
>>>
>>> There's still some maintenance work to do and pull requests to merge, but
>>> a beta release by Christmas should be feasible.
>>
>>
>> To be a bit more detailed here, these are the most significant pull requests
>> / patches that I think can be merged with a limited amount of work:
>> meshgrid enhancements: http://projects.scipy.org/numpy/ticket/966
>> sample_from function: https://github.com/numpy/numpy/pull/151
>> loadtable function: https://github.com/numpy/numpy/pull/143
>>
>> Other maintenance things:
>> - un-deprecate putmask
>> - clean up causes of "DType strings 'O4' and 'O8' are deprecated..."
>> - fix failing einsum and polyfit tests
>> - update release notes
>>
>> Cheers,
>> Ralf
>>
>>
>>> What do you all think?
>>>
>>>
>>> Cheers,
>>> Ralf
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> This isn't the place for this discussion but we should start talking
> about building a *high performance* flat file loading solution with
> good column type inference and sensible defaults, etc. It's clear that
> loadtable is aiming for highest compatibility-- for example I can read
> a 2800x30 file in < 50 ms with the read_table / read_csv functions I
> wrote myself recent in Cython (compared with loadtable taking > 1s as
> quoted in the pull request), but I don't handle European decimal
> formats and lots of other sources of unruliness. I personally don't
> believe in sacrificing an order of magnitude of performance in the 90%
> case for the 10% case-- so maybe it makes sense to have two functions
> around: a superfast custom CSV reader for well-behaved data, and a
> slower, but highly flexible, function like loadtable to fall back on.
> I think R has two functions read.csv and read.csv2, where read.csv2 is
> capable of dealing with things like European decimal format.
>
> - Wes
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

I do not agree with loadtable request simply because not wanting to
have functions that do virtually the same thing - as the comments on
the pull request (and Chris's email on 'Fast Reading of ASCII files').
I would like to see a valid user space justification for including it
because just using regex's is not a suitable justification (but I
agree it is a interesting feature):
If loadtable will be a complete replacement for genfromtxt then there
needs a plan towards supporting all the features of genfromtxt like
'skip_footer' and then genfromtxt needs to be set on the path to be
depreciated.
If loadtable is an intermediate between loadttxt and genfromtxt, then
loadtable needs to be clear exactly what loadtable does not do that
genfromtxt does (anything that loadtable does and genfromtxt does not
do, should be filed as bug against genfromtxt).

Knowing the case makes it easier to provide help by directing users to
the appropriate function and which function should have bug reports
against. For example, loadtxt requires 'Each row in the text file must
have the same number of values' so one can direct a user to genfromtxt
for that case rather than filing a bug report against loadtxt.

I am also somewhat concerned regarding the NA object because of the
limited implementation available. For example, numpy.dot is not
implemented.  Also there appears to be no plan to increase the
implementation across numpy or support it long term. So while I have
no problem with it being included, I do think there must be a serious
commitment to having it fully supporting in the near future as well as
providing a suitable long term roadmap. Otherwise it will just be a
problematic code dump that will be difficult to support.

Bruce