Changing filenames from Greeklish => Greek (subprocess complain)
Νικόλαος Κούρας
nikos.gr33k at gmail.com
Thu Jun 6 07:16:44 EDT 2013
Τη Πέμπτη, 6 Ιουνίου 2013 1:24:16 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε:
> On 05Jun2013 11:43, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= <nikos.gr33k at gmail.com> wrote:
>
> | Τη Τετάρτη, 5 Ιουνίου 2013 9:32:15 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:
>
> | > Using Python, I think you could get the filenames using os.listdir,
>
> | > passing the directory name as a bytestring so that it'll return the
>
> | > names as bytestrings.
>
> |
>
> | > Then, for each name, you could decode from its current encoding and
>
> | > encode to UTF-8 and rename the file, passing the old and new paths to
>
> | > os.rename as bytestrings.
>
> |
>
> | Iam not sure i follow:
>
> |
>
> | Change this:
>
> |
>
> | # Compute a set of current fullpaths
>
> | fullpaths = set()
>
> | path = "/home/nikos/public_html/data/apps/"
>
> |
>
> | for root, dirs, files in os.walk(path):
>
> [...]
>
>
>
> Have a read of this:
>
>
>
> http://docs.python.org/3/library/os.html#os.listdir
>
>
>
> The UNIX API accepts bytes for filenames and paths.
>
>
>
> Python 3 strs are sequences of Unicode code points. If you try to
>
> open a file or directory on a UNIX system using a Python str, that
>
> string must be converted to a sequence of bytes before being handed
>
> to the OS.
>
>
>
> This is done implicitly using your locale settings if you just use a str.
>
>
>
> However, if you pass a bytes to open or listdir, this conversion
>
> does not take place. You put bytes in and in the case of listdir
>
> you get bytes out.
>
>
>
> You can work on pathnames in bytes and never concern yourself with
>
> encode/decode at all.
>
>
>
> In this way you can write code that does not care about the translation
>
> between Unicode and some arbitrary byte encoding.
>
>
>
> Of course, the issue will still arise when accepting user input;
>
> your shell has done exactly this kind of thing when you renamed
>
> your MP3 file. But it is possible to write pure utility code that
>
> doesn't care about filenames as Unicode or str if you work purely
>
> in bytes.
>
> Regarding user filenames, the common policy these days is to use
>
> utf-8 throughout. Of course you need to get everything into that
>
> regime to start with
Τη Πέμπτη, 6 Ιουνίου 2013 1:24:16 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε:
> On 05Jun2013 11:43, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= <nikos.gr33k at gmail.com> wrote:
>
> | Τη Τετάρτη, 5 Ιουνίου 2013 9:32:15 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:
>
> | > Using Python, I think you could get the filenames using os.listdir,
>
> | > passing the directory name as a bytestring so that it'll return the
>
> | > names as bytestrings.
>
> |
>
> | > Then, for each name, you could decode from its current encoding and
>
> | > encode to UTF-8 and rename the file, passing the old and new paths to
>
> | > os.rename as bytestrings.
>
> |
>
> | Iam not sure i follow:
>
> |
>
> | Change this:
>
> |
>
> | # Compute a set of current fullpaths
>
> | fullpaths = set()
>
> | path = "/home/nikos/public_html/data/apps/"
>
> |
>
> | for root, dirs, files in os.walk(path):
>
> [...]
>
>
>
> Have a read of this:
>
>
>
> http://docs.python.org/3/library/os.html#os.listdir
>
>
>
> The UNIX API accepts bytes for filenames and paths.
>
>
>
> Python 3 strs are sequences of Unicode code points. If you try to
>
> open a file or directory on a UNIX system using a Python str, that
>
> string must be converted to a sequence of bytes before being handed
>
> to the OS.
>
>
>
> This is done implicitly using your locale settings if you just use a str.
>
>
>
> However, if you pass a bytes to open or listdir, this conversion
>
> does not take place. You put bytes in and in the case of listdir
>
> you get bytes out.
>
>
>
> You can work on pathnames in bytes and never concern yourself with
>
> encode/decode at all.
>
>
>
> In this way you can write code that does not care about the translation
>
> between Unicode and some arbitrary byte encoding.
>
>
>
> Of course, the issue will still arise when accepting user input;
>
> your shell has done exactly this kind of thing when you renamed
>
> your MP3 file. But it is possible to write pure utility code that
>
> doesn't care about filenames as Unicode or str if you work purely
>
> in bytes.
>
>
>
> Regarding user filenames, the common policy these days is to use
>
> utf-8 throughout. Of course you need to get everything into that
>
> regime to start with.
So i i nee to use os.listdir() to grab those filenames into bytes. okey.
So by changing this to:
fullpaths = set()
path = "/home/nikos/public_html/data/apps/"
for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )
# Compute a set of current fullpaths
fullpaths = os.listdir( '/home/nikos/public_html/data/apps/' )
# Load'em
for fullpath in fullpaths:
try:
# Check the presence of a file against the database and insert if it doesn't exist
cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
data = cur.fetchone() #URL is unique, so should only be one
-----------------------------
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] Original exception was:
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] Traceback (most recent call last):
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] File "files.py", line 67, in <module>
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] query = query.encode(charset)
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcc5' in position 35: surrogates not allowed
More information about the Python-list
mailing list