
So here is the summary question for this thread: what exactly is a surrogate? I think I get it (from reading a l18n email from MAL on the l18n list), but I am not confident enough to stick in the summary as of yet. The following is my current rough summary explanation for what a surrogate is. Can someone please correct it as needed? """ In Unicode, a surrogate is when you encode from a higher bit total encoding (such as utf-16) into a smaller bit total encoding by representing the character as several more bit chunks (such as two utf-8 chunks). The following line is an example: >>> u'\ud800'.encode('utf-8') == '\xed\xa0\x80' Notice how the initial Unicode character ends up being encoded as three characters in utf-8. """ Also, anyone know of some good Unicode tutorials, explanations, etc. on the web, in book form, whatever? Most of the threads that I don't totally comprehend are Unicode related and I would like to minimize my brain-dead questions to a minimum. Don't want my reputation to go down the drain. =) -Brett