[Python-Dev] csv module TODO list
Andrew McNamara
andrewm at object-craft.com.au
Wed Jan 5 10:34:14 CET 2005
>> Andrew McNamara wrote:
>>> There's a bunch of jobs we (CSV module maintainers) have been putting
>>> off - attached is a list (in no particular order):
>>> * unicode support (this will probably uglify the code considerably).
>>
>Martin v. Löwis wrote:
>> Can you please elaborate on that? What needs to be done, and how is
>> that going to be done? It might be possible to avoid considerable
>> uglification.
I'm not altogether sure there. The parsing state machine is all written in
C, and deals with signed chars - I expect we'll need two versions of that
(or one version that's compiled twice using pre-processor macros). Quite
a large job. Suggestions gratefully received.
M.-A. Lemburg wrote:
>Indeed. The trick is to convert to Unicode early and to use Unicode
>literals instead of string literals in the code.
Yes, although it would be nice to also retain the 8-bit versions as well.
>Note that the only real-life Unicode format in use is UTF-16
>(with BOM mark) written by Excel. Note that there's no standard
>for specifying the encoding in CSV files, so this is also the only
>feasable format.
Yes - that's part of the problem I hadn't really thought about yet - the
csv module currently interacts directly with files as iterators, but it's
clear that we'll need to decode as we go.
--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
More information about the Python-Dev
mailing list