[Python-Dev] csv module TODO list

Andrew McNamara andrewm at object-craft.com.au
Wed Jan 5 08:06:43 CET 2005


There's a bunch of jobs we (CSV module maintainers) have been putting
off - attached is a list (in no particular order): 

* unicode support (this will probably uglify the code considerably).

* 8 bit transparency (specifically, allow \0 characters in source string
  and as delimiters, etc).

* Reader and universal newlines don't interact well, reader doesn't
  honour Dialect's lineterminator setting. All outstanding bug id's
  (789519, 944890, 967934 and 1072404) are related to this - it's 
  a difficult problem and further discussion is needed.

* compare PEP-305 and library reference manual to the module as implemented
  and either document the differences or correct them.

* Address or document Francis Avila's issues as mentioned in this posting:

    http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com

* Several blogs complain that the CSV module is no good for parsing
  strings. Suggest making it clearer in the documentation that the reader
  accepts an iterable, rather than a file, and document why an iterable
  (as opposed to a string) is necessary (multi-line records with embedded
  newlines). We could also provide an interface that parses a single
  string (or the old Object Craft interface) for those that really feel
  the need. See:

    http://radio.weblogs.com/0124960/2003/09/12.html
    http://zephyrfalcon.org/weblog/arch_d7_2003_09_06.html#e335

* Compatability API for old Object Craft CSV module?

    http://mechanicalcat.net/cgi-bin/log/2003/08/18

  For example: "from csv.legacy import reader" or something.

* Pure python implementation? 

* Some CSV-like formats consider a quoted field a string, and an unquoted
  field a number - consider supporting this in the Reader and Writer. See:

    http://radio.weblogs.com/0124960/2004/04/23.html

* Add line number and record number counters to reader object?

* it's possible to get the csv parser to suck the whole source file
  into memory with an unmatched quote character. Need to limit size of
  internal buffer.

Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should
already have been addressed):

* remove TODO comment at top of file--it's empty
* is CSV going to be maintained outside the python tree?
  If not, remove the 2.2 compatibility macros for:
         PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc.
* inline the following functions since they are used only in one place
        get_string, set_string, get_nullchar_as_None, set_nullchar_as_None,
        join_reset (maybe)
* rather than use PyErr_BadArgument, should you use assert?
        (first example, Dialect_set_quoting, line 218)
* is it necessary to have Dialect_methods, can you use 0 for tp_methods?
* remove commented out code (PyMem_DEL) on line 261
        Have you used valgrind on the test to find memory overwrites/leaks?
* PyString_AsString()[0] on line 331 could return NULL in which case
        you are dereferencing a NULL pointer
* note sure why there are casts on 0 pointers
        lines 383-393, 733-743, 1144-1154, 1164-1165
* Reader_getiter() can be removed and use PyObject_SelfIter()
* I think you need PyErr_NoMemory() before returning on line 768, 1178
* is PyString_AsString(self->dialect->lineterminator) on line 994
        guaranteed not to return NULL?  If not, it could crash by
        passing to memmove.
* PyString_AsString() can return NULL on line 1048 and 1063, 
        the result is passed to join_append()
* iteratable should be iterable?  (line 1088)
* why doesn't csv_writerows() have a docstring?  csv_writerow does
* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE
* csv_unregister_dialect, csv_get_dialect could use METH_O 
        so you don't need to use PyArg_ParseTuple
* in init_csv, recommend using 
        PyModule_AddIntConstant and PyModule_AddStringConstant
        where appropriate

Also, review comments from Jeremy Hylton, 10 Apr 2003:

    I've been reviewing extension modules looking for C types that should
    participate in garbage collection.  I think the csv ReaderObj and
    WriterObj should participate.  The ReaderObj it contains a reference to
    input_iter that could be an arbitrary Python object.  The iterator
    object could well participate in a cycle that refers to the ReaderObj.
    The WriterObj has a reference to a writeline callable, which could well
    be a method of an object that also points to the WriterObj.

    The Dialect object appears to be safe, because the only PyObject * it
    refers should be a string.  Safe until someone creates an insane string
    subclass <0.4 wink>.

    Also, an unrelated comment about the code, the lineterminator of the
    Dialect is managed by a collection of little helper functions like
    get_string, set_string, etc.  This code appears to be excessively
    general; since they're called only once, it seems clearer to inline the
    logic directly in the get/set methods for the lineterminator.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


More information about the Python-Dev mailing list