Changing filenames from Greeklish => Greek (subprocess complain)
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Jun 5 02:03:41 EDT 2013
On Tue, 04 Jun 2013 21:15:23 -0700, Νικόλαος Κούρας wrote:
> One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek filename
> with spaces.
> Is there a problem when a filename contain both english and greek
> letters? Isn't it still a unicode string?
No problem, and Unicode includes both English and Greek letters.
> All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του Ιησού.mp3"
That's not what you wrote earlier. You said you used FileZilla to
transfer the files from Windows 8.
> and the displayed filename after 'ls -l' returned was:
>
> is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
> \364\357\365\ \311\347\363\357\375.mp3
>
> There is no way at all to check the charset used to store it in hdd? It
> should be UTF-8, but it doesn't look like it. Is there some linxu
> command or some python command that will print out the actual encoding
> of '\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3' ?
You have misunderstood.
The Linux file system does not track encodings. It just stores bytes.
There is no *reliable* way to guess the encoding that a bunch of bytes
came from. If your bytes look like
0x48 0x65 0x6c 0x6c 0x6f 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21
(ASCII "Hello World!") then you might *guess* that the encoding is ASCII,
or UTF-8, or Latin-1. But in general, you can't go from the bytes to the
encoding. Encodings are out-of-band information.
--
Steven
More information about the Python-list
mailing list