Fwd: np.loadtxt : yet a new implementation...

(Sorry about that, I pressed "Reply" instead of "Reply all". Not my day for emails...)
On Dec 1, 2008, at 1:54 PM, John Hunter wrote:
It looks like I am doing something wrong -- trying to parse a CSV file with dates formatted like '2008-10-14', with::
import datetime, sys import dateutil.parser StringConverter.upgrade_mapper(dateutil.parser.parse, default=datetime.date(1900,1,1)) r = loadtxt(sys.argv[1], delimiter=',', names=True)
John, The problem you have is that the default dtype is 'float' (for backwards compatibility w/ the original np.loadtxt). What you want is to automatically change the dtype according to the content of your file: you should use dtype=None
r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None)
As you'll want a recarray, we could make a np.records.loadtxt function where dtype=None would be the default...

On Mon, Dec 1, 2008 at 1:14 PM, Pierre GM pgmdevlist@gmail.com wrote:
The problem you have is that the default dtype is 'float' (for backwards compatibility w/ the original np.loadtxt). What you want is to automatically change the dtype according to the content of your file: you should use dtype=None
r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None)
As you'll want a recarray, we could make a np.records.loadtxt function where dtype=None would be the default...
As you'll want a recarray, we could make a np.records.loadtxt function where dtype=None would be the default...
OK, that worked great. I do think some a default impl in np.rec which returned a recarray would be nice. It might also be nice to have a method like np.rec.fromcsv which defaults to a delimiter=',', names=True and dtype=None. Since csv is one of the most common data interchange format in the world, it would be nice to have some obvious function that works with it with little or no customization required.
Fernando and I have taught a scientific computing course on a number of occasions, and on the last round we taught to undergrads. Most of these students have little or no programming, for many the concept of an array is something they struggle with, dtypes are a difficult concept, but we found that they responded very well to our csv2rec example, because with no syntactic cruft they were able to load a file and do some stats on the columns, and I would like to see that ease of use preserved.
JDH

On Dec 1, 2008, at 2:26 PM, John Hunter wrote
OK, that worked great. I do think some a default impl in np.rec which returned a recarray would be nice. It might also be nice to have a method like np.rec.fromcsv which defaults to a delimiter=',', names=True and dtype=None. Since csv is one of the most common data interchange format in the world, it would be nice to have some obvious function that works with it with little or no customization required.
Quite agreed. Personally, I'd ditch the default dtype=float in favor of dtype=None, but compatibility is an issue. However, if we all agree on genloadtxt, we can use tailored-made version in different modules, like you suggest.
There's an extra issue for which we have an solution I'm not completely satisfied with: names=True. It might be simpler for basic user not to set names=True, and have the first header recognized as names or not if needed (by processing the first line after the others, and using it as header if it's found to be a list of names, or inserting it back at the beginning otherwise)...
participants (2)
-
John Hunter
-
Pierre GM