sqlalchemy and Unicode strings: errormessage

Chris Angelico rosuav at gmail.com
Tue May 31 13:19:52 EDT 2011


On Wed, Jun 1, 2011 at 2:31 AM, Prasad, Ramit <ramit.prasad at jpmchase.com> wrote:
>>line = unicode(line.strip(),'utf8')
>>and now i get really utf8-strings. It does work but i dont know why it works. For me it looks like i change an utf8-string to an utf8-string.
>
>
> I would like to point out that UTF-8 is not exactly "Unicode". From what I understand, Unicode is a standard while UTF-8 is like an implementation of that standard (called an encoding). Being able to convert to Unicode (the standard) should mean you are then able to convert to any encoding that supports the Unicode characters used.

Unicode defines characters; UTF-8 is one way (of many) to represent
those characters in bytes. UTF-16 and UTF-32 are other ways of
representing those characters in bytes, and internally, Python
probably uses one of them - but there is no guarantee, and you should
never need to know. Unicode strings can be stored in memory and
manipulated in various ways, but they're a high level construct on par
with lists and dictionaries - they can't be stored on disk or
transmitted to another computer without using an encoding system.

UTF-8 is an efficient way to translate Unicode text consisting
primarily of low codepoint characters into bytes. It's not so much an
implementation of Unicode as a means of converting a mythical concept
of "Unicode characters" into a concrete stream of bytes.

Hope that clarifies things a little!

Chris Angelico



More information about the Python-list mailing list