[Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

Ray Jones crawlzone at gmail.com
Wed Sep 5 11:42:31 CEST 2012


I have directory names that contain Russian characters, Romanian
characters, French characters, et al. When I search for a file using
glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the
directory names. I thought simply identifying them as Unicode would
clear that up. Nope. Now I have stuff like \u0456\u0439\u043e.

These representations of directory names are eventually going to be
passed to Dolphin (my file manager). Will they pass to Dolphin properly?
Do I need to run a conversion? Can that happen automatically within the
script considering that the various types of characters are all mixed
together in the same directory (i.e. # coding: Latin-1 at the top of the
script is not going to address all the different types of characters).

While on the subject, I just read through the Unicode info for Python
2.7.3. The history was interesting, but the implementation portion was
beyond me. I was looking for a way for a Russian 'backward R' to look
like a Russian 'backward R' - not for a bunch of \xxx and \uxxxxx stuff.
Can someone point me to a page that will clarify the concepts, not just
try to show me the Python implementation of what I already don't
understand? ;)

Thanks


Ray


More information about the Tutor mailing list