[Python-Dev] PEP 277 (unicode filenames): please review

Guido van Rossum guido@python.org
Wed, 14 Aug 2002 08:13:05 -0400


> > Do I misunderstand something, or this this a bug (limitation?) in the
> > unicode->latin-1 decoder?
> 
> It's a limitation, in all codecs. Contributions of normalization code
> are welcome. Since this is hard work, this is unlikely to be fixed in
> Python 2.3 - unless somebody has a really good incentive for fixing
> it.

Note that normalization doesn't belong in the codecs (except perhaps
as a separate Unicode->Unicode codec, since codecs seem to be useful
for all string->string transformations).  It's a separate step that
the application has to request; only the app knows whether a
particular Unicode string is already normalized or not, and whether
the expense is useful for the app, or not.

--Guido van Rossum (home page: http://www.python.org/~guido/)