fixing an horrific formatted csv file.
rosuav at gmail.com
Wed Jul 2 03:20:53 CEST 2014
On Wed, Jul 2, 2014 at 7:41 AM, flebber <flebber.crue at gmail.com> wrote:
> I understand why providing full solutions is frowned upon, because it doesn't assist in learning. Which is true, it's incredibly helpful in this case.
In this case, my main reason for not providing a full solution is that
the work tends to be iterative. When I have a huge and messy file,
what I usually do is grab the first half-dozen lines and work out how
I'd go about fixing them manually, then write a script that does that.
Then run the script on the whole file, and see where it either chokes
or produces wrong data. Pick up the first few lines of wrong data,
figure out how to tweak the program to handle those. Rinse and repeat.
Often, what that results in is a file that gets progressively tidier.
When the scope of the mess is infinite (like with human-entered data -
believe you me, you haven't seen messy until you've seen what a
committee can do to a simple job), this means you stop working on the
script at exactly the point where it stops being worth the effort -
which is something that only you can decide.
More information about the Python-list