help I'm getting delimited
sjmachin at lexicon.net
Thu Dec 18 00:06:47 CET 2008
On Dec 18, 3:15 am, aka <alexoploca... at gmail.com> wrote:
> John, this is the actual code I ran in TurboGears which is a Python
It's not complete -- the change in indentation would have caused a
If (as you appear to assert) the problem is in the csv module, then
create a small stand-alone no-TurboGears Python script and a test file
which together demonstrate the problem reproducibly so that the
problem can investigated by anyone with a standard TurboGears-free
If you can't reproduce the problem in that manner, then you may need
to seek assistance in a TurboGears-specific forum.
> I should have left away the import statements. Trust me, the problem
> isn't in there because the UnicodeWriter is functioning perfectly.
Do you mean that this file was created by whatever.UnicodeWriter? If
so, did you just now discover this information?
How do you know that "the UnicodeWriter is functioning perfectly"?
What does "functioning perfectly mean to you"? In particular, what
encoding is it using?
> I did allready sanitate the csv file to these four lines in Notepad so
> there isn't anything more than this:
Which do you mean:
(a) you typed those lines into Notepad yourself
(b) you took a copy of a file created by whatever.UnicodeWriter,
opened it with Notepad, trimmed off some rows and columns, and saved
You said earlier
csv.reader results in: for r in reader: Error: line contains NULL
Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec
can't decode byte 0xff in position 0: unexpected code byte
Those results are consistent with your file being encoded in utf16_le,
with the utf16_le BOM ('\xff\xfe') at the start of the file.
Have you, as I asked, looked at the file with some better-than-Notepad
Here's a likely hypothesis: the file was written in utf16. In that
either (i) you really want utf16 (why?), so:
(1) the csv module will not cope with it, and is not expected to cope
(2) the whatever.UnicodeReader should (in order of preference):
(a) be allowed to find out for itself that 'utf16' is the go
(b) be told explicitly that 'utf16' is the go
(c) be served with a bug report
OR (ii) you really want utf8, so:
(1) the csv module should be happy
(2) the whatever.UnicodeWriter should be told to use 'utf8'
(3) the whatever.UnicodeReader should (in order of preference):
[as above but s/16/8/]
More information about the Python-list