[Csv] Out of town, BDFL pronouncement, incorporation, Unicode
Andrew McNamara
andrewm at object-craft.com.au
Fri Feb 14 07:34:45 CET 2003
>Assuming nothing earth-shattering develops by mid-week, would one of you
>like to propose on python-dev that Guido pronounce on the PEP and give a
>thumbs-up or -down on the module? I can take care of merging it into the
>Python distribution (stitch it into setup.py, the test directory and the
>libref manual) when I return.
Okay.
>Any thoughts from Dave and Andrew about Unicode? Marc André Lemburg (or was
>it Martin von Löwis?) suggested just encoding Unicode as utf-8. Someone
>else (Fredrik Lundh I believe) suggested a double-compilation scheme such as
>Modules/_sre.c uses. One pass gets you 8-bit characters, the other wide
>characters. Presumably, the correct state machine to execute would be
>chosen based upon the input data types.
What little I know about utf-8 suggests that the current module should be
safe - nulls won't appear, and subsequent bytes in multi-byte characters
all have their high bit set. None of the special characters can be a
unicode character, of course. The user could do something like:
csv.reader([line.encode('utf-8') for line in lines])
I think the unicode files emitted by Excel are actually utf-8 encoded,
so this won't even be necessary - the user will just have to decode each
field with the utf-8 codec.
Proper unicode support is something we probably should do (the user
might have a UCS-2 encoded file, etc), but it won't happen in the next
week or so.
--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
More information about the Csv
mailing list