Trying to fix Invalid CSV File
Roel Schroeven
rschroev_nospam_ml at fastmail.fm
Wed Aug 6 05:21:22 EDT 2008
Ryan Rosario schreef:
> Next time I am going to be much more careful. Tab delimited is
> probably better for my purpose, but I can definitely see there being
> issues with invisible tab characters and other weirdness.
No matter which delimiter you use, there will always be data that
includes that delimiter, and you need some way to deal with it.
I prefer the approach that esr suggests in "The Art of Unix Programming"
(http://www.catb.org/~esr/writings/taoup/html/ch05s02.html): define a
delimiter (preferably but necessary one that doesn't occur frequently in
your data) and an escape character. On output, escape all occurrences of
delimiter and escape character in your data. On input, you can trivially
and unambiguously distinguish delimiters in the data from delimiters
between data, and unescape everything.
Cheers,
Roel
--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov
Roel Schroeven
More information about the Python-list
mailing list