Changing filenames from Greeklish => Greek (subprocess complain)
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Jun 3 02:46:46 EDT 2013
On Sun, 02 Jun 2013 22:05:28 -0700, Νικόλαος Κούρας wrote:
> Why subprocess fails when it has to deal with a greek flename? and that
> an indirect call too....
It doesn't. The command you are calling fails, not subprocess.
The code you show is this:
/home/nikos/public_html/cgi-bin/metrites.py in ()
217 template = htmldata + counter
218 elif page.endswith('.py'):
=> 219 htmldata = subprocess.check_output( '/home/nikos/
public_html/cgi-bin/' + page )
220 template = htmldata.decode('utf-8').replace
( 'Content-type: text/html; charset=utf-8', '' ) + counter
The first step is to inspect the value of the file name. Normally I would
just call print, but since this is live code, and a web server, you
probably don't want to use print directly. But you can print to a file,
and then inspect the file. Using logging is probably better, but here's a
real quick and dirty way to get the same result:
elif page.endswith('.py'):
name = '/home/nikos/public_html/cgi-bin/' + page
print(name, file=open('/home/nikos/out.txt', 'w'))
htmldata = subprocess.check_output(name)
Now inspect /tmp/out.txt using the text editor of your choice. What does
it contain? Is the file name of the executable what you expect? Does it
exist, and is it executable?
The next step, after checking that, is to check the executable .py file.
It may contain a bug which is causing this problem. However, I think I
can guess what the nature of the problem is.
The output you show includes:
cmd = '/home/nikos/public_html/cgi-bin/files.py'
output = b'Content-type: text/html; charset=utf-8\n\n<bod...n
position 74: surrogates not allowed\n\n-->\n\n'
My *guess* of your problem is this: your file names have invalid bytes in
them, when interpreted as UTF-8.
Remember, on a Linux system, file names are stored as bytes. So the file-
name-as-a-string need to be *encoded* into bytes. My *guess* is that
somehow, when renaming your files, you gave them a name which may be
correctly encoded in some other encoding, but not in UTF-8. Then, when
you try to read the file names in UTF-8, you hit an illegal byte, half of
a surrogate pair perhaps, and everything blows up.
Something like this:
py> s = "Νικόλαος Κούρας"
py> b = s.encode('ISO-8859-7') # Oh oh, wrong encoding!
py> print(b)
b'\xcd\xe9\xea\xfc\xeb\xe1\xef\xf2 \xca\xef\xfd\xf1\xe1\xf2'
py> b.decode('UTF-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 0:
invalid continuation byte
Obviously the error is a little different, because the original string is
different.
If I am right, the solution is to fix the file names to ensure that they
are all valid UTF-8 names. If you view the directory containing these
files in a file browser that supports UTF-8, do you see any file names
containing Mojibake?
http://en.wikipedia.org/wiki/Mojibake
Fix those file names, and hopefully the problem will go away.
--
Steven
More information about the Python-list
mailing list