[Python-Dev] csv module TODO list
Andrew McNamara
andrewm at object-craft.com.au
Wed Jan 5 08:06:43 CET 2005
There's a bunch of jobs we (CSV module maintainers) have been putting
off - attached is a list (in no particular order):
* unicode support (this will probably uglify the code considerably).
* 8 bit transparency (specifically, allow \0 characters in source string
and as delimiters, etc).
* Reader and universal newlines don't interact well, reader doesn't
honour Dialect's lineterminator setting. All outstanding bug id's
(789519, 944890, 967934 and 1072404) are related to this - it's
a difficult problem and further discussion is needed.
* compare PEP-305 and library reference manual to the module as implemented
and either document the differences or correct them.
* Address or document Francis Avila's issues as mentioned in this posting:
http://www.google.com.au/groups?selm=vsb89q1d3n5qb1%40corp.supernews.com
* Several blogs complain that the CSV module is no good for parsing
strings. Suggest making it clearer in the documentation that the reader
accepts an iterable, rather than a file, and document why an iterable
(as opposed to a string) is necessary (multi-line records with embedded
newlines). We could also provide an interface that parses a single
string (or the old Object Craft interface) for those that really feel
the need. See:
http://radio.weblogs.com/0124960/2003/09/12.html
http://zephyrfalcon.org/weblog/arch_d7_2003_09_06.html#e335
* Compatability API for old Object Craft CSV module?
http://mechanicalcat.net/cgi-bin/log/2003/08/18
For example: "from csv.legacy import reader" or something.
* Pure python implementation?
* Some CSV-like formats consider a quoted field a string, and an unquoted
field a number - consider supporting this in the Reader and Writer. See:
http://radio.weblogs.com/0124960/2004/04/23.html
* Add line number and record number counters to reader object?
* it's possible to get the csv parser to suck the whole source file
into memory with an unmatched quote character. Need to limit size of
internal buffer.
Also, review comments from Neal Norwitz, 22 Mar 2003 (some of these should
already have been addressed):
* remove TODO comment at top of file--it's empty
* is CSV going to be maintained outside the python tree?
If not, remove the 2.2 compatibility macros for:
PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc.
* inline the following functions since they are used only in one place
get_string, set_string, get_nullchar_as_None, set_nullchar_as_None,
join_reset (maybe)
* rather than use PyErr_BadArgument, should you use assert?
(first example, Dialect_set_quoting, line 218)
* is it necessary to have Dialect_methods, can you use 0 for tp_methods?
* remove commented out code (PyMem_DEL) on line 261
Have you used valgrind on the test to find memory overwrites/leaks?
* PyString_AsString()[0] on line 331 could return NULL in which case
you are dereferencing a NULL pointer
* note sure why there are casts on 0 pointers
lines 383-393, 733-743, 1144-1154, 1164-1165
* Reader_getiter() can be removed and use PyObject_SelfIter()
* I think you need PyErr_NoMemory() before returning on line 768, 1178
* is PyString_AsString(self->dialect->lineterminator) on line 994
guaranteed not to return NULL? If not, it could crash by
passing to memmove.
* PyString_AsString() can return NULL on line 1048 and 1063,
the result is passed to join_append()
* iteratable should be iterable? (line 1088)
* why doesn't csv_writerows() have a docstring? csv_writerow does
* any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE
* csv_unregister_dialect, csv_get_dialect could use METH_O
so you don't need to use PyArg_ParseTuple
* in init_csv, recommend using
PyModule_AddIntConstant and PyModule_AddStringConstant
where appropriate
Also, review comments from Jeremy Hylton, 10 Apr 2003:
I've been reviewing extension modules looking for C types that should
participate in garbage collection. I think the csv ReaderObj and
WriterObj should participate. The ReaderObj it contains a reference to
input_iter that could be an arbitrary Python object. The iterator
object could well participate in a cycle that refers to the ReaderObj.
The WriterObj has a reference to a writeline callable, which could well
be a method of an object that also points to the WriterObj.
The Dialect object appears to be safe, because the only PyObject * it
refers should be a string. Safe until someone creates an insane string
subclass <0.4 wink>.
Also, an unrelated comment about the code, the lineterminator of the
Dialect is managed by a collection of little helper functions like
get_string, set_string, etc. This code appears to be excessively
general; since they're called only once, it seems clearer to inline the
logic directly in the get/set methods for the lineterminator.
--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
More information about the Python-Dev
mailing list