[Numpy-discussion] Question about improving genfromtxt errors
Pierre GM
pgmdevlist at gmail.com
Mon Sep 28 13:36:07 EDT 2009
On Sep 28, 2009, at 12:51 PM, Skipper Seabold wrote:
> This was probably due to the way that I timed it, honestly. I only
> did it once. The only differences I made for that part were in the
> first post of the thread. Two incremented scalars for line numbers
> and column numbers and a try/except block.
>
> I'm really not against a debug mode if someone wants to do it, and
> it's deemed necessary. If it could be made to log all of the errors
> that would be extremely helpful. I still need to post some of my use
> cases though. Anything to help make data cleaning less of a chore...
I was thinking about something this week-end: we could create a second
list when looping on the rows, where we would store the length of each
splitted row. After the loop, we can find if these values don't match
the expected number of columns `nbcols` and where. Then, we can decide
to strip the `rows` list of its invalid values (that corresponds to
skipping) or raise an exception, but in both cases we know where the
problem is.
My only concern is that we'd be creating yet another list of integers,
which would increase memory usage. Would it be a problem ?
In other news, I should eventually be able to tackle that this week...
More information about the NumPy-Discussion
mailing list