[Chicago] understanding unicode problems
Pete
pfein at pobox.com
Fri Nov 16 19:25:55 CET 2007
On Friday November 16 2007 12:56:37 pm Carl Karsten wrote:
> "no memory representation (that you care about)" - um, I do care. From
> what I am reading, there is some representation that can only exist in ...
> ram? and can not be written to disk. What, will it conger up demons or
> suck my drive into a black hole?
Really, you'll be much happier if you just close your eyes & think of England.
As Kumar notes, the whole reason we use Python is so that we don't have to
think about memory layout issues. You don't go around worrying about how
your Python/Java/C++/VB classes are layed out in memory, do you? This is no
different.
> > encode() takes a unicode and produces a str
> > decode() takes a str and produces a unicode
>
> For this to help, unicode and str need to be defined better. but I think I
> made that clear :)
str == bytes. Like, the same bytes you grew up with writing BASIC on your
Amiga, though dressed up sexy in OO.
unicode == text. With all the nastiness of character sets & memory
representation hidden so you don't have to worry about it.
You can treat a str like text if you want, but that's your business. And doing
so will give you encoding errors.
> OK, now you threw me again. the stuff is in memory. how about we invent a
> hex encoding that does a hexdump of whatever is in memory?
Try GDB. Seriously, it doesn't matter how it's represented internally by
Python. Heck, a Python int isn't a C int either.
> What happens if you pickle one of theses suckers?
It gets written in some internal binary format that pickle understands. How
does a list get pickled?
From within python, a unicode is composed of a sequence of 1-character
unicodes. That's all you need to know to get your work done.
--
Peter Fein || 773-575-0694 || pfein at pobox.com
http://www.pobox.com/~pfein/ || PGP: 0xCCF6AE6B
irc: pfein at freenode.net || jabber: peter.fein at gmail.com
More information about the Chicago
mailing list