[Csv] Re: [Python-Dev] csv module TODO list

Thu Jan 6 02:10:55 CET 2005

>>>>Can you please elaborate on that? What needs to be done, and how is
>>>>that going to be done? It might be possible to avoid considerable
>>>>uglification.
>> 
>> I'm not altogether sure there. The parsing state machine is all written in
>> C, and deals with signed chars - I expect we'll need two versions of that
>> (or one version that's compiled twice using pre-processor macros). Quite
>> a large job. Suggestions gratefully received.
>
>I'm still trying to understand what *needs* to be done - I would move to
>how this is done only later. What APIs should be extended/changed, and
>in what way?

That's certainly the first step, and I have to admit that I don't have
a clear idea at this time - the unicode issue has been in the "too hard"
basket since we started.

Marc-Andre Lemburg mentioned that he has encountered UTF-16 encoded csv
files, so a reasonable starting point would be the ability to read and
parse, as well as the ability to generate, one of these.

The reader interface currently returns a row at a time, consuming as many
lines from the supplied iterable (with the most common iterable being
a file). This suggests to me that we will need an optional "encoding"
argument to the reader constructor, and that the reader will need to
decode the source lines. That said, I'm hardly a unicode expert, so I
may be overlooking something (could a utf-16 encoded character span a
line break, for example).  The writer interface probably should have
similar facilities.

However - a number of people have complained about the "iterator"
interface, wanting to supply strings (the iterable is necessary because a
CSV row can span multiple lines). It's also conceiveable that the source
lines could already be unicode objects.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/