mishandling of embedded NULs (was: Re: [Csv] trial zip/tar packages of csv module available)
John Machin
sjmachin at lexicon.net
Fri Feb 14 23:48:33 CET 2003
[John Machin]
>>> Judging by the fact that in _csv.c '\0' is passed around as a line-
>>> ending signal, it's not 8-bit-clean. This fact should be at least
>>> documented, if not fixed (which looks like a bit of a rewrite). Strange
>>> behaviour on embedded '\0' may worry not only pedants but also folk who
>>> are recipients of data files created by J. Random Boofhead III and
>>> friends.
[Andrew McNamara]
>> Yep - Skip - can you doco the fact that the input should not contain
>> null
>> characters or be unicode strings?
>>
>> Null characters in the input will be treated as newlines, if I remember
>> correctly.
>
[John Machin]
> Docoing that would be useful as well.
[and it's me again:]
Actually it doesn't quite treat a NUL exactly like a newline; it throws
data away without any warning; see below.
>>> import csv
>>> guff = ["aaa\0bbb", "x\0\0y"]
>>> [x for x in csv.reader(guff)]
[['aaa'], ['x']]
>>> guff2 = ["aaa\nbbb", "x\n\ny"]
>>> [x for x in csv.reader(guff2)]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
_csv.Error: newline inside string
>>>
More information about the Csv
mailing list