[Csv] Re: Unicode again

Andrew McNamara andrewm at object-craft.com.au
Wed Feb 12 07:49:43 CET 2003


>I've been thinking a little about the Unicode issue some more.  I really
>think you don't want to dive into picking apart Unicode strings.  If
>nothing else, you'll have to deal with a mixture of wide and narrow
>characters.  How about two paths?  If you know everything's a plain
>string, execute your current code.  If any elements are Unicode strings,
>take the slower, high-level path.

I've had a bit of a chance to look at the C unicode implementation, and
it's pretty clean - essentially you just have a string of unsigned shorts
(or unsigned longs if python was build with wide support) instead of
unsigned chars. Generally you don't have to worry about variable length
data (we'd cover 99.99% of use cases by ignoring the exceptions).

I think I currently favour the approach used in sre, where preprocessor
tricks are used to compile two versions of the core, but I'm sure this
won't be trivial. Probably not something we can deal with before 2.3.
Hopefully this won't preclude integration with 2.3.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


More information about the Csv mailing list