[Numpy-discussion] Question about improving genfromtxt errors

Pierre GM pgmdevlist at gmail.com
Mon Sep 28 13:36:07 EDT 2009


On Sep 28, 2009, at 12:51 PM, Skipper Seabold wrote:

> This was probably due to the way that I timed it, honestly.  I only
> did it once.  The only differences I made for that part were in the
> first post of the thread.  Two incremented scalars for line numbers
> and column numbers and a try/except block.
>
> I'm really not against a debug mode if someone wants to do it, and
> it's deemed necessary.  If it could be made to log all of the errors
> that would be extremely helpful.  I still need to post some of my use
> cases though.  Anything to help make data cleaning less of a chore...

I was thinking about something this week-end: we could create a second  
list when looping on the rows, where we would store the length of each  
splitted row. After the loop, we can find if these values don't match  
the expected number of columns `nbcols` and where. Then, we can decide  
to strip the `rows` list of its invalid values (that corresponds to  
skipping) or raise an exception, but in both cases we know where the  
problem is.
My only concern is that we'd be creating yet another list of integers,  
which would increase memory usage. Would it be a problem ?
In other news, I should eventually be able to tackle that this week...





More information about the NumPy-Discussion mailing list