Module for reading CSV data
logiplexsoftware at earthlink.net
Mon Nov 12 20:25:12 CET 2001
On Saturday 10 November 2001 01:03, Ian Parker wrote:
> The ASV module by Laurence Tratt handles csv files very well. IIRC it
> doesn't fall apart on quoted strings containing commas.
As long as everyone else is plugging CSV modules, I may as well point you to
a module I wrote a few months ago for importing CSV files:
It is poorly documented and full of re.ugliness, but it does some things that
I haven't seen in any other csv importer:
- Not limited to using commas as delimiters
- It can guess the delimiter
- It can guess the text qualifier (single or double quotes)
- It can guess whether the first row is a header
- It handles quoted delimiters
- It handles quoted newlines
- It can handle inconsistent quoting (Excel, for instance, only quotes data
that requires it, i.e. data containing delimiters or newlines, whereas some
other programs quote everything).
- It has an optional dialog (using wxPython) for previewing the data prior to
import (ala MS Excel) and allowing the user to change the guessed parameters.
- It's reasonably fast, considering the amount of data analysis it does. The
heuristics analyze the smallest portion of the file they can get away with,
so increasing the file size won't usually increase the time spent in the
guessing steps (although it will obviously affect the overall time to import).
The guessing steps seem to be reliable, but can be skipped and set
- I had some problems when Python 2.0 first came out with the sre module that
somehow broke my regular expressions. I believe this was fixed, but can't
recall what the condition was that caused the error, so can't be sure. My
recent tests seem to indicate everything is working properly.
- The code for the wxDialog is ugly (I was planning on creating a wizard-like
series of dialogs, but never got around to it). What is there is usable
- The code for the guessing heuristics is poorly documented and fairly dense.
On the other hand, this code was used in a production environment without a
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308
(800) 735-0555 x308
More information about the Python-list