[Python-Dev] Re: [Csv] Example workaround classes for using Unicode with csv module...

Mon Mar 21 00:28:02 CET 2005

>I added UnicodeReader and UnicodeWriter example classes to the csv module
>docs just now.  They mention problems with ASCII NUL characters (which I
>vaguely remember - NUL-terminated strings are used internally, right?).  Do
>NULs still present a problem?  I saw nothing in the log messages that
>mentioned "ascii" or "nul" so I presume it is.

That's right - it still uses null terminated strings internally, and the
various special characters (quotechar, escapechar, etc) use null to mean
"not specified". Fixing this would cause much upheaval.

>Here's what I added.  Let me know if you think it needs any corrections,
>especially if there's a better way to word "as long as you avoid encodings
>like utf-16 that use NULs".  Can that just be "as long as you avoid
>multi-byte encodings other than utf-8"?  

I think only utf-8 provides the guarantees needed for this to work -
specifically, multi-byte characters need to have the high bit set
(otherwise a delimiter or other special character appearing within a
multi-byte character would upset the parsing), while at the same time
having single byte characters for the characters with special meaning
to the parser: note also that none of the special characters (quotechar,
delimiter, escapechar, etc) can be a multi-byte sequence.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/