Re: [Numpy-discussion] Question about improving genfromtxt errors

Sept. 28, 2009


      On Sep 28, 2009, at 12:51 PM, Skipper Seabold wrote:
...
This was probably due to the way that I timed it, honestly.  I only
did it once.  The only differences I made for that part were in the
first post of the thread.  Two incremented scalars for line numbers
and column numbers and a try/except block.
I'm really not against a debug mode if someone wants to do it, and
it's deemed necessary.  If it could be made to log all of the errors
that would be extremely helpful.  I still need to post some of my use
cases though.  Anything to help make data cleaning less of a chore...
I was thinking about something this week-end: we could create a second  
list when looping on the rows, where we would store the length of each  
splitted row. After the loop, we can find if these values don't match  
the expected number of columns `nbcols` and where. Then, we can decide  
to strip the `rows` list of its invalid values (that corresponds to  
skipping) or raise an exception, but in both cases we know where the  
problem is.
My only concern is that we'd be creating yet another list of integers,  
which would increase memory usage. Would it be a problem ?
In other news, I should eventually be able to tackle that this week...

Re: [Numpy-discussion] Question about improving genfromtxt errors

Pierre GM