help I'm getting delimited

John Machin sjmachin at lexicon.net
Thu Dec 18 00:06:47 CET 2008


On Dec 18, 3:15 am, aka <alexoploca... at gmail.com> wrote:
> John, this is the actual code I ran in TurboGears which is a Python
> framework.

It's not complete -- the change in indentation would have caused a
SyntaxError.

If (as you appear to assert) the problem is in the csv module, then
create a small stand-alone no-TurboGears Python script and a test file
which together demonstrate the problem reproducibly so that the
problem can investigated by anyone with a standard TurboGears-free
Python installation.

If you can't reproduce the problem in that manner, then you may need
to seek assistance in a TurboGears-specific forum.

> I should have left away the import statements. Trust me, the problem
> isn't in there because the UnicodeWriter is functioning perfectly.

Do you mean that this file was created by whatever.UnicodeWriter? If
so, did you just now discover this information?

How do you know that "the UnicodeWriter is functioning perfectly"?
What does "functioning perfectly mean to you"? In particular, what
encoding is it using?

> I did allready sanitate the csv file to these four lines in Notepad so
> there isn't anything more than this:
>
> id;company;department
> 12;Cadillac;Research
> 11;Ford;Accounting
> 10;Chrysler;Sales

Which do you mean:
(a) you typed those lines into Notepad yourself
(b) you took a copy of a file created by whatever.UnicodeWriter,
opened it with Notepad, trimmed off some rows and columns, and saved
it again
?

You said earlier
"""
csv.reader results in: for r in reader: Error: line contains NULL
byte

Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec
can't decode byte 0xff in position 0: unexpected code byte
"""

Those results are consistent with your file being encoded in utf16_le,
with the utf16_le BOM ('\xff\xfe') at the start of the file.

Have you, as I asked, looked at the file with some better-than-Notepad
diagnostic apparatus?

Here's a likely hypothesis: the file was written in utf16. In that
case:
either (i) you really want utf16 (why?), so:

(1) the csv module will not cope with it, and is not expected to cope
with it

(2) the whatever.UnicodeReader should (in order of preference):
   (a) be allowed to find out for itself that 'utf16' is the go
   (b) be told explicitly that 'utf16' is the go
   (c) be served with a bug report

OR (ii) you really want utf8, so:

(1) the csv module should be happy
(2) the whatever.UnicodeWriter should be told to use 'utf8'
(3) the whatever.UnicodeReader should (in order of preference):
    [as above but s/16/8/]

HTH,
John



More information about the Python-list mailing list