[Csv] Re: [Python-Dev] csv module TODO list

Wed Jan 5 10:34:14 CET 2005

>> Andrew McNamara wrote:
>>> There's a bunch of jobs we (CSV module maintainers) have been putting
>>> off - attached is a list (in no particular order):
>>> * unicode support (this will probably uglify the code considerably).
>> 
>Martin v. Löwis wrote:
>> Can you please elaborate on that? What needs to be done, and how is
>> that going to be done? It might be possible to avoid considerable
>> uglification.

I'm not altogether sure there. The parsing state machine is all written in
C, and deals with signed chars - I expect we'll need two versions of that
(or one version that's compiled twice using pre-processor macros). Quite
a large job. Suggestions gratefully received.

M.-A. Lemburg wrote:
>Indeed. The trick is to convert to Unicode early and to use Unicode
>literals instead of string literals in the code.

Yes, although it would be nice to also retain the 8-bit versions as well.

>Note that the only real-life Unicode format in use is UTF-16
>(with BOM mark) written by Excel. Note that there's no standard
>for specifying the encoding in CSV files, so this is also the only
>feasable format.

Yes - that's part of the problem I hadn't really thought about yet - the
csv module currently interacts directly with files as iterators, but it's 
clear that we'll need to decode as we go.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/