Changing filenames from Greeklish => Greek (subprocess complain)
Νικόλαος Κούρας
nikos.gr33k at gmail.com
Wed Jun 5 02:05:33 EDT 2013
Τη Τετάρτη, 5 Ιουνίου 2013 8:40:39 π.μ. UTC+3, ο χρήστης Michael Torrie έγραψε:
> On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:
>
> > One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek
>
> > filename with spaces. Is there a problem when a filename contain both
>
> > english and greek letters? Isn't it still a unicode string?
>
> >
>
> > All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του
>
> > Ιησού.mp3"
>
> >
>
> > and the displayed filename after 'ls -l' returned was:
>
> >
>
> > is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
>
> > \364\357\365\ \311\347\363\357\375.mp3
>
> >
>
> > There is no way at all to check the charset used to store it in hdd?
>
> > It should be UTF-8, but it doesn't look like it. Is there some linxu
>
> > command or some python command that will print out the actual
>
> > encoding of '\305\365\367\336\ \364\357\365\
>
> > \311\347\363\357\375.mp3' ?
>
>
>
> I can see that you are starting to understand things. I can't answer
> your question (don't know the answer), but you're correct about one
> thing. A filename is just a sequence of bytes. We'd hope it would be
> utf-8, but it could be anything. Even worse, it's not possible to tell
> from a byte stream what encoding it is unless we just try one and see
> what happens. Text editors, for example, have to either make a guess
> (utf-8 is a good one these days), or ask, or try to read from the first
> line of the file using ascii and see if there's a source code character
> set command to give it an idea.
Um, is there a way even if we don't actually know the encoding CentOS used to store the filename to hdd to tell Python to just open the bytestream as it is?
I don't know if its possible, but iam looking for a way to skip the encoding, since we have now way of knowing what this is.
This is very weird because:
nikos at superhost.gr [~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
nikos at superhost.gr [~]#
all i did it was a simple rename from english to greek. Since locale is set to use utf8, shouldnt the result in the hdd be an utf-8 bytestream?
More information about the Python-list
mailing list