[Numpy-discussion] Fast Reading of ASCII files

Wed Dec 14 14:22:23 EST 2011

On Wed, Dec 14, 2011 at 4:11 PM, Bruce Southey <bsouthey at gmail.com> wrote:

> **
> On 12/14/2011 01:03 AM, Chris Barker wrote:
>
>
>
> On Tue, Dec 13, 2011 at 1:21 PM, Ralf Gommers <ralf.gommers at googlemail.com
> > wrote:
>
>>
>>   genfromtxt sure looks close for an API
>>>
>>
>> This I don't agree with. It has a huge amount of keywords that just
>> confuse or intimidate a beginning user. There should be a dead simple
>> interface, even the loadtxt API is on the heavy side.
>>
>
> well, yes, though it does do a lot -- do you have a smpler one in mind?
>
> Just looking at what I normally wouldn't need for simple data files and/or
what a beginning user won't understand at once, the `unpack` and `ndmin`
keywords could certainly be left out. `converters` is also questionable.
That's probably as simple as it can get.

Note that I don't think this should be changed now, that's not worth the
trouble.

>  But anyway, the really simple cases, are reallly simle, even with
> genfromtxt.
>
> I guess it's a matter of debate about what is a better API:
>
> a few functions, each adding a layer of sophistication
>
> or
>
> one function, with layers of sophistication added with an array of keyword
> arguments.
>
> There's always a trade-off, but looking at the docstring for genfromtxt
should make it an easy call in this case.

>  In either case, though I wish the multiple functionality built on the
> same, well optimized core code.
>
> I wish that too, but I'm fairly certain that you can't write that core
code with the ability to handle missing and irregular data and make it
close to the same speed as an optimized reader for regular data.

  I am not sure that you can even create a simple API here as even Python's
> csv module is rather complex especially when it just reads data as strings.
> It also 'hides' many arguments in the Dialect class although these are just
> the collection of 7 'fmtparam' arguments. It also provides the Sniffer
> class that tries to find correct format that can then be passed to the
> reader function. Then you still have to convert the data into the required
> types - another set of arguments as well as yet another pass through the
> data.
>
> In comparison, genfromtxt can perform sniffing
>

I assume you mean the ``dtype=None`` example in the docstring? That works
to some extent, but you still need to specify the delimiter. I commented on
that on the loadtable PR.

> and both genfromtxt and loadtxt can read and convert the data. These also
> add some useful features like skipping rows (start, end and commented) and
> columns. However, it could be possible to create a sniffer function and a
> single data reader function leading to a 'simple' reader function but that
> probably would not change the API of the underlying data reader function.
>

Better auto-detection of things like delimiters would indeed be quite
useful.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111214/c151a85a/attachment.html>