
On Tue, Sep 29, 2009 at 4:36 PM, Bruce Southey <bsouthey@gmail.com> wrote: <snip>
Hi, The first case just has to handle a missing delimiter - actually I expect that most of my cases would relate this. So here is simple Python code to generate arbitrary large list with the occasional missing delimiter.
I set it so it reads the desired number of rows and frequency of bad rows from the linux command line. $time python tbig.py 1000000 100000
If I comment out the extra prints in io.py that I put in, it takes about 22 seconds to finish if the delimiters are correct. If I have the missing delimiter it takes 20.5 seconds to crash.
Bruce
I think this would actually cover most of the problems I was running into. The only other one I can think of is when I used a converter that I thought would work, but it got unexpected data. For example, from StringIO import StringIO import numpy as np strip_rand = lambda x : float(('r' in x.lower() and x.split()[-1]) or (not 'r' in x.lower() and x.strip() or 0.0)) # Example usage strip_rand('R 40') strip_rand(' ') strip_rand('') strip_rand('40') strip_per = lambda x : float(('%' in x.lower() and x.split()[0]) or (not '%' in x.lower() and x.strip() or 0.0)) # Example usage strip_per('7 %') strip_per('7') strip_per(' ') strip_per('') # Unexpected usage strip_per('R 1') s = StringIO('D01N01,10/1/2003 ,1 %,R 75,400,600\r\nL24U05,12/5/2003\ ,2 %,1,300, 150.5\r\nD02N03,10/10/2004 ,R 1,,7,145.55') data = np.genfromtxt(s, converters = {2 : strip_per, 3 : strip_rand}, delimiter=",", dtype=None) I don't have a clean install right now, but I think this returned a converter is locked for upgrading error. I would just like to know where the problem occured (line and column, preferably not zero-indexed), so I can go and have a look at my data. One more note, being able to autostrip whitespace turned out to be very helpful. I didn't realize how much memory strings of spaces could take up, and as soon as I turned this on, I was able to process an array with a lot of whitespace without filling up my memory. So I think maybe autostrip should be turned on by default? I will post anything else if it occurs to me. Skipper