[Numpy-discussion] numpy 1.7.0 release?
wesmckinn at gmail.com
Tue Dec 6 17:13:43 EST 2011
On Tue, Dec 6, 2011 at 4:11 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
> On Mon, Dec 5, 2011 at 8:43 PM, Ralf Gommers <ralf.gommers at googlemail.com>
>> Hi all,
>> It's been a little over 6 months since the release of 1.6.0 and the NA
>> debate has quieted down, so I'd like to ask your opinion on the timing of
>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small
>> improvements, plus three larger chucks of work:
>> - datetime
>> - NA
>> - Bento support
>> My impression is that both datetime and NA are releasable, but should be
>> labeled "tech preview" or something similar, because they may still see
>> significant changes. Please correct me if I'm wrong.
>> There's still some maintenance work to do and pull requests to merge, but
>> a beta release by Christmas should be feasible.
> To be a bit more detailed here, these are the most significant pull requests
> / patches that I think can be merged with a limited amount of work:
> meshgrid enhancements: http://projects.scipy.org/numpy/ticket/966
> sample_from function: https://github.com/numpy/numpy/pull/151
> loadtable function: https://github.com/numpy/numpy/pull/143
> Other maintenance things:
> - un-deprecate putmask
> - clean up causes of "DType strings 'O4' and 'O8' are deprecated..."
> - fix failing einsum and polyfit tests
> - update release notes
>> What do you all think?
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
This isn't the place for this discussion but we should start talking
about building a *high performance* flat file loading solution with
good column type inference and sensible defaults, etc. It's clear that
loadtable is aiming for highest compatibility-- for example I can read
a 2800x30 file in < 50 ms with the read_table / read_csv functions I
wrote myself recent in Cython (compared with loadtable taking > 1s as
quoted in the pull request), but I don't handle European decimal
formats and lots of other sources of unruliness. I personally don't
believe in sacrificing an order of magnitude of performance in the 90%
case for the 10% case-- so maybe it makes sense to have two functions
around: a superfast custom CSV reader for well-behaved data, and a
slower, but highly flexible, function like loadtable to fall back on.
I think R has two functions read.csv and read.csv2, where read.csv2 is
capable of dealing with things like European decimal format.
More information about the NumPy-Discussion