Unicode questions
M.-A. Lemburg
mal at egenix.com
Wed Oct 20 08:41:01 EDT 2010
Tobiah wrote:
> I've been reading about the Unicode today.
> I'm only vaguely understanding what it is
> and how it works.
>
> Please correct my understanding where it is lacking.
> Unicode is really just a database of character information
> such as the name, unicode section, possible
> numeric value etc. These points of information
> are indexed by standard, never changing numeric
> indexes, so that 0x2CF might point to some
> character information set, that all the world
> can agree on. The actual image that gets
> displayed in response to the integer is generally
> assigned and agreed upon, but it is up to the
> software responding to the unicode value to define
> and generate the actual image that will represent that
> character.
Correct. The "actual images" are called glyphs in Unicode-speak.
> Now for the mysterious encodings. There is the UTF-{8,16,32}
> which only seem to indicate what the binary representation
> of the unicode character points is going to be. Then there
> are 100 or so other encoding, many of which are language
> specific. ASCII encoding happens to be a 1-1 mapping up
> to 127, but then there are others for various languages etc.
> I was thinking maybe this special case and the others were lookup
> mappings, where a
> particular language user could work with characters perhaps
> in the range of 0-255 like we do for ASCII, but then when
> decoding, to share with others, the plain unicode representation
> would be shared? Why can't we just say "unicode is unicode"
> and just share files the way ASCII users do. Just have a huge
> ASCII style table that everyone sticks to. Please enlighten
> my vague and probably ill-formed conception of this whole thing.
UTF-n are transfer encodings of the Unicode table (the one you
are probably referring to). They represent the same code points,
but using different trade-offs.
If you're looking for a short intro to Unicode in Python,
have a look at these talks I've given on the subject:
http://www.egenix.com/library/presentations/#PythonAndUnicode
http://www.egenix.com/library/presentations/#DesigningUnicodeAwareApplicationsInPython
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Oct 20 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-list
mailing list