Some csv oddities
Francis Avila
francisgavila at yahoo.com
Thu Nov 27 02:00:31 EST 2003
Three things that seem odd with csv:
Python 2.3.2 (#49, Oct 2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> class unix(csv.Dialect):
... delimiter = ':'
... escapechar = '\\'
... lineterminator = '\n'
... quoting = csv.QUOTE_NONE
... skipinitialspace = False
...
>>> csv.register_dialect('unix', unix)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\PYTHON23\lib\csv.py", line 39, in __init__
raise Error, "Dialect did not validate: %s" % ", ".join(errors)
_csv.Error: Dialect did not validate: doublequote parameter must be True or
False
Now, it seems to me that QUOTE_NONE makes doublequote meaningless, because
there's no quote character. And csv.writer doesn't write the quotechar
escaped--using the above dialect, cw.writerow(['1', '"', '3']) will write
the raw bytes '1:":3\n'. However, the corresponding csv.reader chokes on
those bytes.
Untested patch (csv.py, line 64):
if self.doublequote not in (True, False):
- errors.append("doublequote parameter must be True or False")
+ if self.quoting != QUOTE_NONE:
+ errors.append("doublequote parameter must be True or
False")
Moving on...
I also can't seem to use my own registered dialects:
>>> class unix(csv.Dialect):
... delimiter = ':'
... escapechar = '\\'
... lineterminator = '\n'
... quoting = csv.QUOTE_NONE
... skipinitialspace = False
... doublequote = False #Fine
...
>>> csv.register_dialect('unix', unix)
>>> csv.list_dialects() #Worked
['excel-tab', 'excel', 'unix']
>>> fp = file('csvtest', 'wb')
>>> csv.writer(fp, 'unix') #csv.reader fails too, same error
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
>>> ud = vars(unix)
>>> csv.writer(fp, **ud)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: '_csv.Dialect' object has no attribute '__module__'
>>> del ud['__module__']
>>> del ud['__doc__']
>>> cw = csv.writer(fp, **ud) #if **ud works, why not 'unix'?
Third issue was mentioned above: csv.reader chokes on quotechar, even if
ignored.
>>> cw.writerow('1 : 3'.split())
>>> cw.writerow('1 " 3'.split())
>>> del cw
>>> fp.flush()
>>> fp.close()
>>> fp = file('csvtest', 'rb')
>>> fp.read() # Looks good
'1:\\::3\n1:":3\n'
>>> fp.seek(0)
>>> cr = csv.reader(fp, **ud)
>>> cr.next()
['1', ':', '3']
>>> cr.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
_csv.Error: newline inside string
My guess is that it's trying to grab a string delimited by ", and hits the
newline before getting a matching ":
>>> import StringIO
>>> fp = StringIO.StringIO()
>>> cw = csv.writer(fp, **ud)
>>> cr = csv.reader(fp, **ud)
>>> cw.writerow('1 " 3 " 5'.split())
>>> fp.buflist
['1:":3:":5\n']
>>> cr.next()
['1', ':3:', '5']
This should be ['1', '"', '3', '"', '5'].
One might just set quotechar=None for this dialect, but this raises:
>>> cr = csv.reader(fp, quotechar=None)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
>>>
This is contrary to PEP 305's specs:
quotechar specifies a one-character string to use as the quoting character.
It defaults to '"'. Setting this to None has the same effect as setting
quoting to csv.QUOTE_NONE.
So it seems that it is impossible for csv.reader to parse dialects which
don't use quoting.
--
Francis Avila
More information about the Python-list
mailing list