[Numpy-discussion] deprecate fromstring() for text reading?

Chris Barker chris.barker at noaa.gov
Mon Nov 2 18:44:06 EST 2015


On Tue, Oct 27, 2015 at 7:30 AM, Benjamin Root <ben.v.root at gmail.com> wrote:

> FWIW, when I needed a fast Fixed Width reader
>

was there potentially no whitespace between fields in that case? In which
case, it really isn a different use-case than delimited text -- if it's at
all common, a version written in C would be nice and fast. and nat hard to
do.

But fromstring never would have helped you with that anyway :-)

-CHB



> for a very large dataset last year, I found that np.genfromtext() was
> faster than pandas' read_fwf(). IIRC, pandas' text reading code fell back
> to pure python for fixed width scenarios.
>
> On Fri, Oct 23, 2015 at 8:22 PM, Chris Barker - NOAA Federal <
> chris.barker at noaa.gov> wrote:
>
>> Grabbing the pandas csv reader would be great, and I hope it happens
>> sooner than later, though alas, I haven't the spare cycles for it either.
>>
>> In the meantime though, can we put a deprecation Warning in when using
>> fromstring() on text files? It's really pretty broken.
>>
>> -Chris
>>
>> On Oct 23, 2015, at 4:02 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>>
>>
>>
>> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Oct 23, 2015 3:30 PM, "Jeff Reback" <jeffreback at gmail.com> wrote:
>> >
>> > On Oct 23, 2015, at 6:13 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>> >
>> >>
>> >>
>> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal <
>> chris.barker at noaa.gov> wrote:
>> >>>
>> >>>
>> >>>> I think it would be good to keep the usage to read binary data at
>> least.
>> >>>
>> >>>
>> >>> Agreed -- it's only the text file reading I'm proposing to deprecate.
>> It was kind of weird to cram it in there in the first place.
>> >>>
>> >>> Oh, fromfile() has the same issues.
>> >>>
>> >>> Chris
>> >>>
>> >>>
>> >>>> Or is there a good alternative to `np.fromstring(<bytes>,
>> dtype=...)`?  -- Marten
>> >>>>
>> >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker <chris.barker at noaa.gov>
>> wrote:
>> >>>>>
>> >>>>> There was just a question about a bug/issue with scipy.fromstring
>> (which is numpy.fromstring) when used to read integers from a text file.
>> >>>>>
>> >>>>>
>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html
>> >>>>>
>> >>>>> fromstring() is bugging and inflexible for reading text files --
>> and it is a very, very ugly mess of code. I dug into it a while back, and
>> gave up -- just to much of a mess!
>> >>>>>
>> >>>>> So we really should completely re-implement it, or deprecate it. I
>> doubt anyone is going to do a big refactor, so that means deprecating it.
>> >>>>>
>> >>>>> Also -- if we do want a fast read numbers from text files function
>> (which would be nice, actually), it really should get a new name anyway.
>> >>>>>
>> >>>>> (and the hopefully coming new dtype system would make it easier to
>> write cleanly)
>> >>>>>
>> >>>>> I'm not sure what deprecating something means, though -- have it
>> raise a deprecation warning in the next version?
>> >>>>>
>> >>
>> >> There was discussion at SciPy 2015 of separating out the text reading
>> abilities of Pandas so that numpy could include it. We should contact Jeff
>> Rebeck and see about moving that forward.
>> >
>> >
>> > IIRC Thomas Caswell was interested in doing this :)
>>
>> When he was in Berkeley a few weeks ago he assured me that every night
>> since SciPy he has dutifully been feeling guilty about not having done it
>> yet. I think this week his paltry excuse is that he's "on his honeymoon" or
>> something.
>>
>> ...which is to say that if someone has some spare cycles to take this
>> over then I think that might be a nice wedding present for him :-).
>>
>> (The basic idea is to take the text reading backend behind
>> pandas.read_csv and extract it into a standalone package that pandas could
>> depend on, and that could also be used by other packages like numpy (among
>> others -- I thing dato's SFrame package has a fork of this code as well?))
>>
>> -n
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> I can certainly provide guidance on how/what to extract but don't have
>> spare cycles myself for this :(
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151102/d78363a4/attachment.html>


More information about the NumPy-Discussion mailing list