[Numpy-discussion] Possible roadmap addendum: building better text file readers

Paulo Jabardo pjabardo at yahoo.com.br
Mon Feb 27 13:00:09 EST 2012


I don't know what is the best solution but this certainly isn't madness. 

First of all '.' isn't international notation it is used in some countries. In most of Europe (and Latin America) the comma is used. Anyone in countries that use a comma as a separator will stumble upon text files with comma as decimal separators very often. Usually a simple search and replace is sufficient but if if the data has string fields, one might mess up the data.

Is this the most important feature? Of course not but it helps a lot. As a matter of fact, one of the reasons I started to use R years ago was the flexibility of the function read.table: I don't have to worry about tabular data in text text files, I know I can read them (most of the time...). Now, I use rpy to call read.table.

As for speed, right now read.table is faster than loadtxt. Of course numpy shouldn't simply reproduce any feature found in R (or matlab, scilab, etc) but reading data from external sources is a very important step in any data analysis (and often a difficult step). So while this feature is not a top priority it is important for anyone that has to deal with external data written by other programs that use the "correct" locale and it is certainly not in the path to madness.

I have been thinking for a while about writing/porting a read.table equivalent but unfortunately I haven't had much time in the past few months and because of that I have kind of stopped my transition from R to python for a while.

Paulo


________________________________
 De: Alan G Isaac <alan.isaac at gmail.com>
Para: Discussion of Numerical Python <numpy-discussion at scipy.org> 
Enviadas: Segunda-feira, 27 de Fevereiro de 2012 12:53
Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers
 
On 2/27/2012 10:10 AM,
 Paulo Jabardo wrote:
> I have a few features that I believe would make text file easier for many people. In some countries (most?) the decimal separator in real numbers is not a point but a comma.
> I think it would be very useful that the decimal separator be specified with a keyword argument (decimal = '.' for example) on the text reading function.


Down that path lies madness.

For a fast reader, just document input format to use
"international notation" (i.e., the decimal point)
and give the user the responsibility to ensure the
data are in the right format.

The format translation utilities should be separate,
and calling them should be optional.

fwiw,
Alan Isaac
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120227/d4071515/attachment.html>


More information about the NumPy-Discussion mailing list