Flexible string representation, unicode, typography, ...
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Thu Aug 30 02:55:01 EDT 2012
On Wed, 29 Aug 2012 08:43:05 -0700, wxjmfauth wrote:
> I can hit the nail a little more.
> I have even a better idea and I'm serious.
>
> If "Python" has found a new way to cover the set of the Unicode
> characters, why not proposing it to the Unicode consortium?
Because the implementation of the str datatype in a programming language
has nothing to do with the Unicode consortium. You might as well propose
it to the International Union of Railway Engineers.
> Unicode has already three schemes covering practically all cases: memory
> consumption, maximum flexibility and an intermediate solution.
And Python's solution uses those: UCS-2, UCS-4, and UTF-8.
The only thing which is innovative here is that instead of the Python
compiler declaring that "all strings will be stored in UCS-2", the
compiler chooses an implementation for each string as needed. So some
strings will be stored internally as UCS-4, some as UCS-2, and some as
ASCII (which is a standard, but not the Unicode consortium's standard).
(And possibly some as UTF-8? I'm not entirely sure from reading the PEP.)
There's nothing radical here, honest.
--
Steven
More information about the Python-list
mailing list