help I'm getting delimited
alexoplocatie at gmail.com
Thu Dec 18 13:16:34 CET 2008
> On Dec 18, 3:15 am, aka <alexoploca... at gmail.com> wrote:
> Do you mean that this file was created by whatever.UnicodeWriter? If
> so, did you just now discover this information?
> How do you know that "the UnicodeWriter is functioning perfectly"?
> What does "functioning perfectly mean to you"? In particular, what
> encoding is it using?
> Which do you mean:
> (a) you typed those lines into Notepad yourself
> (b) you took a copy of a file created by whatever.UnicodeWriter,
> opened it with Notepad, trimmed off some rows and columns, and saved
> it again
> Here's a likely hypothesis: the file was written in utf16. In that
> either (i) you really want utf16 (why?), so:
> (1) the csv module will not cope with it, and is not expected to cope
> with it
> (2) the whatever.UnicodeReader should (in order of preference):
> (a) be allowed to find out for itself that 'utf16' is the go
> (b) be told explicitly that 'utf16' is the go
> (c) be served with a bug report
> OR (ii) you really want utf8, so:
> (1) the csv module should be happy
> (2) the whatever.UnicodeWriter should be told to use 'utf8'
> (3) the whatever.UnicodeReader should (in order of preference):
> [as above but s/16/8/]
The csv file originally was created by the UnicodeWriter class and
used for a mailmerge function with Microsoft Word which all
The reverse did not: read back the outputted file so at last I
it in Notepad, cutting off columns, but I didn't know that the
encoding would remain even after that because it still caused
Now after testing from the Python command line with a csv file
generated from Excel I could get it working so it had to be the
Because the write side of my code, which uses the UnicodeWriter, was
ok I didn't pay attention to the fact that I had changed the UW class
from UTF-8 to UTF-16 because of difficulties with dutch characters
like ë and ö.
Then at last I tried changing back to UTF-8 and noticed both out -and
input was working, including those special characters, so it was my
unjustifiable conclusion that I couldn't get around these special
characters at the write side without UTF-16 which ultimately got me
in trouble with the read side.
With your help I got it straight. Once again minimizing the problem
to its bare basics and preventing too large steps is the key.
Thanks a lot for your help John.
BTW, the TurboGears code is not very different from Python,
it just uses some extra identifiers.
More information about the Python-list