Unicode questions

M.-A. Lemburg mal at egenix.com
Wed Oct 20 08:41:01 EDT 2010


Tobiah wrote:
> I've been reading about the Unicode today.
> I'm only vaguely understanding what it is
> and how it works.
> 
> Please correct my understanding where it is lacking. 
> Unicode is really just a database of character information
> such as the name, unicode section, possible 
> numeric value etc.  These points of information
> are indexed by standard, never changing numeric
> indexes, so that 0x2CF might point to some 
> character information set, that all the world
> can agree on.  The actual image that gets 
> displayed in response to the integer is generally
> assigned and agreed upon, but it is up to the
> software responding to the unicode value to define
> and generate the actual image that will represent that
> character.

Correct. The "actual images" are called glyphs in Unicode-speak.

> Now for the mysterious encodings.  There is the UTF-{8,16,32}
> which only seem to indicate what the binary representation
> of the unicode character points is going to be.  Then there
> are 100 or so other encoding, many of which are language
> specific.  ASCII encoding happens to be a 1-1 mapping up
> to 127, but then there are others for various languages etc.
> I was thinking maybe this special case and the others were lookup 
> mappings, where a
> particular language user could work with characters perhaps
> in the range of 0-255 like we do for ASCII, but then when
> decoding, to share with others, the plain unicode representation
> would be shared?  Why can't we just say "unicode is unicode"
> and just share files the way ASCII users do.  Just have a huge
> ASCII style table that everyone sticks to.  Please enlighten
> my vague and probably ill-formed conception of this whole thing.

UTF-n are transfer encodings of the Unicode table (the one you
are probably referring to). They represent the same code points,
but using different trade-offs.

If you're looking for a short intro to Unicode in Python,
have a look at these talks I've given on the subject:

http://www.egenix.com/library/presentations/#PythonAndUnicode
http://www.egenix.com/library/presentations/#DesigningUnicodeAwareApplicationsInPython

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 20 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-list mailing list