[Numpy-discussion] deprecate fromstring() for text reading?

Benjamin Root ben.v.root at gmail.com
Tue Oct 27 10:30:08 EDT 2015


FWIW, when I needed a fast Fixed Width reader for a very large dataset last
year, I found that np.genfromtext() was faster than pandas' read_fwf().
IIRC, pandas' text reading code fell back to pure python for fixed width
scenarios.

On Fri, Oct 23, 2015 at 8:22 PM, Chris Barker - NOAA Federal <
chris.barker at noaa.gov> wrote:

> Grabbing the pandas csv reader would be great, and I hope it happens
> sooner than later, though alas, I haven't the spare cycles for it either.
>
> In the meantime though, can we put a deprecation Warning in when using
> fromstring() on text files? It's really pretty broken.
>
> -Chris
>
> On Oct 23, 2015, at 4:02 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>
>
>
> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
> On Oct 23, 2015 3:30 PM, "Jeff Reback" <jeffreback at gmail.com> wrote:
> >
> > On Oct 23, 2015, at 6:13 PM, Charles R Harris <charlesr.harris at gmail.com>
> wrote:
> >
> >>
> >>
> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal <
> chris.barker at noaa.gov> wrote:
> >>>
> >>>
> >>>> I think it would be good to keep the usage to read binary data at
> least.
> >>>
> >>>
> >>> Agreed -- it's only the text file reading I'm proposing to deprecate.
> It was kind of weird to cram it in there in the first place.
> >>>
> >>> Oh, fromfile() has the same issues.
> >>>
> >>> Chris
> >>>
> >>>
> >>>> Or is there a good alternative to `np.fromstring(<bytes>,
> dtype=...)`?  -- Marten
> >>>>
> >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker <chris.barker at noaa.gov>
> wrote:
> >>>>>
> >>>>> There was just a question about a bug/issue with scipy.fromstring
> (which is numpy.fromstring) when used to read integers from a text file.
> >>>>>
> >>>>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
> >>>>>
> >>>>> fromstring() is bugging and inflexible for reading text files -- and
> it is a very, very ugly mess of code. I dug into it a while back, and gave
> up -- just to much of a mess!
> >>>>>
> >>>>> So we really should completely re-implement it, or deprecate it. I
> doubt anyone is going to do a big refactor, so that means deprecating it.
> >>>>>
> >>>>> Also -- if we do want a fast read numbers from text files function
> (which would be nice, actually), it really should get a new name anyway.
> >>>>>
> >>>>> (and the hopefully coming new dtype system would make it easier to
> write cleanly)
> >>>>>
> >>>>> I'm not sure what deprecating something means, though -- have it
> raise a deprecation warning in the next version?
> >>>>>
> >>
> >> There was discussion at SciPy 2015 of separating out the text reading
> abilities of Pandas so that numpy could include it. We should contact Jeff
> Rebeck and see about moving that forward.
> >
> >
> > IIRC Thomas Caswell was interested in doing this :)
>
> When he was in Berkeley a few weeks ago he assured me that every night
> since SciPy he has dutifully been feeling guilty about not having done it
> yet. I think this week his paltry excuse is that he's "on his honeymoon" or
> something.
>
> ...which is to say that if someone has some spare cycles to take this over
> then I think that might be a nice wedding present for him :-).
>
> (The basic idea is to take the text reading backend behind pandas.read_csv
> and extract it into a standalone package that pandas could depend on, and
> that could also be used by other packages like numpy (among others -- I
> thing dato's SFrame package has a fork of this code as well?))
>
> -n
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> I can certainly provide guidance on how/what to extract but don't have
> spare cycles myself for this :(
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151027/3a5790c3/attachment.html>


More information about the NumPy-Discussion mailing list