[Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

Ray Jones crawlzone at gmail.com
Wed Sep 5 16:51:52 CEST 2012


On 09/05/2012 07:31 AM, eryksun wrote:
> On Wed, Sep 5, 2012 at 5:42 AM, Ray Jones <crawlzone at gmail.com> wrote:
>> I have directory names that contain Russian characters, Romanian
>> characters, French characters, et al. When I search for a file using
>> glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the
>> directory names. I thought simply identifying them as Unicode would
>> clear that up. Nope. Now I have stuff like \u0456\u0439\u043e.
> This is just an FYI in case you were manually decoding. Since glob
> calls os.listdir(dirname), you can get Unicode output if you call it
> with a Unicode arg:
>
>     >>> t = u"\u0456\u0439\u043e"
>     >>> open(t, 'w').close()
>
>     >>> import glob
>
>     >>> glob.glob('*')  # UTF-8 output
>     ['\xd1\x96\xd0\xb9\xd0\xbe']
>
>     >>> glob.glob(u'*')
>     [u'\u0456\u0439\u043e']
Yes, I played around with that some....in my lack of misunderstanding, I
thought that adding the 'u' would pop the characters out at me the way
they should appear. Silly me.... ;)
> Regarding subprocess.Popen, just use Unicode -- at least on a POSIX
> system. Popen calls an exec function, such as posix.execv, which
> handles encoding Unicode arguments to the file system encoding.
>
> On Windows, the _subprocess C extension in 2.x is limited to calling
> CreateProcessA with char* 8-bit strings. So Unicode characters beyond
> ASCII (the default encoding) trigger an encoding error.
subprocess.call(['dolphin', '/my_home/testdir/\u044c\u043e\u0432'])

Dolphin's error message: 'The file or folder
/my_home/testdir/\u044c\u043e\u0432 does not exist'

But if I copy the characters as seen by Bash's shell and paste them into
my subprocess.call(), Dolphin recognizes the directory just fine.

So is Dolphin unicode-dead, or am I missing something?


Ray


More information about the Tutor mailing list