[Csv] Out of town, BDFL pronouncement, incorporation, Unicode

Andrew McNamara andrewm at object-craft.com.au
Fri Feb 14 07:34:45 CET 2003


>Assuming nothing earth-shattering develops by mid-week, would one of you
>like to propose on python-dev that Guido pronounce on the PEP and give a
>thumbs-up or -down on the module?  I can take care of merging it into the
>Python distribution (stitch it into setup.py, the test directory and the
>libref manual) when I return.

Okay.

>Any thoughts from Dave and Andrew about Unicode?  Marc André Lemburg (or was
>it Martin von Löwis?) suggested just encoding Unicode as utf-8.  Someone
>else (Fredrik Lundh I believe) suggested a double-compilation scheme such as
>Modules/_sre.c uses.  One pass gets you 8-bit characters, the other wide
>characters.  Presumably, the correct state machine to execute would be
>chosen based upon the input data types.

What little I know about utf-8 suggests that the current module should be
safe - nulls won't appear, and subsequent bytes in multi-byte characters
all have their high bit set. None of the special characters can be a
unicode character, of course. The user could do something like:

    csv.reader([line.encode('utf-8') for line in lines])

I think the unicode files emitted by Excel are actually utf-8 encoded,
so this won't even be necessary - the user will just have to decode each
field with the utf-8 codec.

Proper unicode support is something we probably should do (the user
might have a UCS-2 encoded file, etc), but it won't happen in the next
week or so.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


More information about the Csv mailing list