Some csv oddities

Thu Nov 27 02:00:31 EST 2003

Three things that seem odd with csv:

Python 2.3.2 (#49, Oct  2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> class unix(csv.Dialect):
...     delimiter = ':'
...     escapechar = '\\'
...     lineterminator = '\n'
...     quoting = csv.QUOTE_NONE
...     skipinitialspace = False
...
>>> csv.register_dialect('unix', unix)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "D:\PYTHON23\lib\csv.py", line 39, in __init__
    raise Error, "Dialect did not validate: %s" % ", ".join(errors)
_csv.Error: Dialect did not validate: doublequote parameter must be True or
False

Now, it seems to me that QUOTE_NONE makes doublequote meaningless, because
there's no quote character.  And csv.writer doesn't write the quotechar
escaped--using the above dialect, cw.writerow(['1', '"', '3']) will write
the raw bytes '1:":3\n'.  However, the corresponding csv.reader chokes on
those bytes.

Untested patch (csv.py, line 64):

         if self.doublequote not in (True, False):
-            errors.append("doublequote parameter must be True or False")
+            if self.quoting != QUOTE_NONE:
+                errors.append("doublequote parameter must be True or
False")

Moving on...

I also can't seem to use my own registered dialects:

>>> class unix(csv.Dialect):
...     delimiter = ':'
...     escapechar = '\\'
...     lineterminator = '\n'
...     quoting = csv.QUOTE_NONE
...     skipinitialspace = False
...     doublequote = False #Fine
...
>>> csv.register_dialect('unix', unix)
>>> csv.list_dialects()  #Worked
['excel-tab', 'excel', 'unix']
>>> fp = file('csvtest', 'wb')
>>> csv.writer(fp, 'unix')  #csv.reader fails too, same error
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
>>> ud = vars(unix)
>>> csv.writer(fp, **ud)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: '_csv.Dialect' object has no attribute '__module__'
>>> del ud['__module__']
>>> del ud['__doc__']
>>> cw = csv.writer(fp, **ud) #if **ud works, why not 'unix'?

Third issue was mentioned above: csv.reader chokes on quotechar, even if
ignored.

>>> cw.writerow('1 : 3'.split())
>>> cw.writerow('1 " 3'.split())
>>> del cw
>>> fp.flush()
>>> fp.close()
>>> fp = file('csvtest', 'rb')
>>> fp.read() # Looks good
'1:\\::3\n1:":3\n'
>>> fp.seek(0)
>>> cr = csv.reader(fp, **ud)
>>> cr.next()
['1', ':', '3']
>>> cr.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
_csv.Error: newline inside string

My guess is that it's trying to grab a string delimited by ", and hits the
newline before getting a matching ":

>>> import StringIO
>>> fp = StringIO.StringIO()
>>> cw = csv.writer(fp, **ud)
>>> cr = csv.reader(fp, **ud)
>>> cw.writerow('1 " 3 " 5'.split())
>>> fp.buflist
['1:":3:":5\n']
>>> cr.next()
['1', ':3:', '5']

This should be ['1', '"', '3', '"', '5'].

One might just set quotechar=None for this dialect, but this raises:

>>> cr = csv.reader(fp, quotechar=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
>>>

This is contrary to PEP 305's specs:

quotechar specifies a one-character string to use as the quoting character.
It defaults to '"'. Setting this to None has the same effect as setting
quoting to csv.QUOTE_NONE.

So it seems that it is impossible for csv.reader to parse dialects which
don't use quoting.

--
Francis Avila