Where is the ucs-32 codec?

cben@users.sf.net beni.cherniavsky at gmail.com
Fri Jun 9 15:34:38 CEST 2006


Martin v. Löwis wrote:
> Erik Max Francis wrote:
> >> The only reason is that nobody has needed one so far, and because
> >> it is quite some work to do if done correctly. Why do you need it?
> >
Somebody asked me about generating UTF-32 (he didn't have choice of the
output format).
I was about to propose the obvious ``u.encode('utf-32')`` but
discovered it's missing.
Someone proposed 'unicode-internal' but it depends on the build and is
an ugly answer.
Next time, I want Guido's Time Machine to just work, so I have to fix
this ;-).

> > Why would it be "quite some work"?  Converting from UTF-16 to UTF-32 is
> > pretty straightforward, and UTF-16 is already supported.
>
> I would like to see it correct, unlike the current UTF-16 codec. Perhaps
> whoever contributes an UTF-32 codec could also deal with the defects of
> the UTF-16 codec.
>
Now this is interesting, as I hoped to base my code on UTF-16 (and
perhaps UTF-8 for combining surrogates)...  Can you elaborate?

I could attempt to fix UTF-16 as well but I don't have the expertise to
choose the right behaviour,
so you'll have to specify precisely what it should do (that it doesn't
do now).




More information about the Python-list mailing list