Stephen J. Turnbull
stephen at xemacs.org
Sat Feb 25 20:44:14 CET 2006
>>>>> "Ron" == Ron Adam <rrr at ronadam.com> writes:
Ron> So, lets consider a "codec" and a "coding" as being two
Ron> different things where a codec is a character sub set of
Ron> unicode characters expressed in a native format. And a
Ron> coding is *not* a subset of the unicode character set, but an
Ron> _opperation_ performed on text.
Ron> codec -> text is always in *one_codec* at any time.
No, a codec is an operation, not a state.
And text qua text has no need of state; the whole point of defining
text (as in the unicode type) is to abstract from such
Ron> Pure codecs such as latin-1 can be envoked over and over and
Ron> you can always get back what you put in in a single step.
Maybe you'd like to define them that way, but it doesn't work in
general. Given that str and unicode currently don't carry state with
them, it's not possible for "to ASCII" and "to EBCDIC" to be
idempotent at the same time. And for the languages spoken by 75% of
the world's population, "to latin-1" cannot be successfully invoked
even once, let alone be idempotent. You really need to think about
how your examples apply to codecs like KOI8-R for Russian and Shift
JIS for Japanese.
In practice, I just don't think you can distinguish "codecs" from
"coding" using the kind of mathematical properties you have described
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the Python-Dev