
On Sep 28, 2009, at 12:51 PM, Skipper Seabold wrote:
This was probably due to the way that I timed it, honestly. I only did it once. The only differences I made for that part were in the first post of the thread. Two incremented scalars for line numbers and column numbers and a try/except block.
I'm really not against a debug mode if someone wants to do it, and it's deemed necessary. If it could be made to log all of the errors that would be extremely helpful. I still need to post some of my use cases though. Anything to help make data cleaning less of a chore...
I was thinking about something this week-end: we could create a second list when looping on the rows, where we would store the length of each splitted row. After the loop, we can find if these values don't match the expected number of columns `nbcols` and where. Then, we can decide to strip the `rows` list of its invalid values (that corresponds to skipping) or raise an exception, but in both cases we know where the problem is. My only concern is that we'd be creating yet another list of integers, which would increase memory usage. Would it be a problem ? In other news, I should eventually be able to tackle that this week...