[Python-Dev] Re: [Csv] Example workaround classes for using Unicode
with csv module...
Andrew McNamara
andrewm at object-craft.com.au
Mon Mar 21 00:28:02 CET 2005
>I added UnicodeReader and UnicodeWriter example classes to the csv module
>docs just now. They mention problems with ASCII NUL characters (which I
>vaguely remember - NUL-terminated strings are used internally, right?). Do
>NULs still present a problem? I saw nothing in the log messages that
>mentioned "ascii" or "nul" so I presume it is.
That's right - it still uses null terminated strings internally, and the
various special characters (quotechar, escapechar, etc) use null to mean
"not specified". Fixing this would cause much upheaval.
>Here's what I added. Let me know if you think it needs any corrections,
>especially if there's a better way to word "as long as you avoid encodings
>like utf-16 that use NULs". Can that just be "as long as you avoid
>multi-byte encodings other than utf-8"?
I think only utf-8 provides the guarantees needed for this to work -
specifically, multi-byte characters need to have the high bit set
(otherwise a delimiter or other special character appearing within a
multi-byte character would upset the parsing), while at the same time
having single byte characters for the characters with special meaning
to the parser: note also that none of the special characters (quotechar,
delimiter, escapechar, etc) can be a multi-byte sequence.
--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
More information about the Python-Dev
mailing list