Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Mon Jun 3 01:22:33 EDT 2013


Ok, this email is something of a recital of how I approached this.

The apache error log:

I restarted the apache:
  /etc/init.d/httpd restart

Then a:
  ps axf
gave me the PID of a running httpd. Examining its open files:
  lsof -p 9287

shows me:
  httpd   9287 nobody    2w   REG    0,192 12719609  56510510 /usr/local/apache/logs/error_log
  httpd   9287 nobody    7w   REG    0,192  7702310  56510512 /usr/local/apache/logs/access_log
among many others.

So, to monitor these logs:

  tail -F /usr/local/apache/logs/error_log /usr/local/apache/logs/access_log &

placing the tail in the background so I can still use that shell.

Watching the log while fetching the page:

  http://superhost.gr/

says:

  ==> /usr/local/apache/logs/error_log <==
  [Tue Apr 23 12:11:40 2013] [error] [client 54.252.27.86] suexec policy violation: see suexec log for more details
  [Tue Apr 23 12:11:40 2013] [error] [client 54.252.27.86] Premature end of script headers: metrites.py
  [Tue Apr 23 12:11:40 2013] [error] [client 54.252.27.86] File does not exist: /home/nikos/public_html/500.shtml
  [Tue Apr 23 12:11:43 2013] [error] [client 107.22.40.41] suexec policy violation: see suexec log for more details
  [Tue Apr 23 12:11:43 2013] [error] [client 107.22.40.41] Premature end of script headers: metrites.py
  [Tue Apr 23 12:11:43 2013] [error] [client 107.22.40.41] File does not exist: /home/nikos/public_html/500.shtml
  [Tue Apr 23 12:11:45 2013] [error] [client 79.125.63.121] suexec policy violation: see suexec log for more details
  [Tue Apr 23 12:11:45 2013] [error] [client 79.125.63.121] Premature end of script headers: metrites.py
  [Tue Apr 23 12:11:45 2013] [error] [client 79.125.63.121] File does not exist: /home/nikos/public_html/500.shtml

So:

You're using suexec in your Apache. This greatly complicates your debugging.

Suexec seems to be a facility for arranging that CGI script run as the user
who owns them. Because that has a lot of potential for ghastly
security holes, suexec performs a large number of strict checks on
CGI script locations, permissions and locations before running a
CGI script.  At a guess the first hurdle would be that metrites.py
is owned by root. Suexec is very picky about what users it is
prepared to become. "root" is not one of them, as you might imagine.

I've chowned metrites.py to nikos:nikos. Suexec not lets it run, producing this:

Traceback (most recent call last):
  File "metrites.py", line 9, in <module>
    sys.stderr = open('/home/nikos/public_html/cgi.err.out', 'a')
PermissionError: [Errno 13] Permission denied: '/home/nikos/public_html/cgi.err.out'

That file is owned by root. metrites.py is being run as nikos.

So:

  chown nikos:nikos /home/nikos/public_html/cgi.err.out


A page reload now shows this:

Error in sys.excepthook:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2334-2342: ordinal not in range(128)

  Original exception was:
  Traceback (most recent call last):
    File "metrites.py", line 226, in <module>
      print( template )
  UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-38: ordinal not in range(128)

This shows you writing the string in template to stdout. The default
encoding for stdout is 'ascii', accepting only characters of values
0..127. I expect template contains more than this, since the ASCII
range is very US Anglocentric; Greek characters for example won't
encode into ascii.

As mentioned in the thread on python-list, python will adopt your
terminal's encoding it used interactively but will be pretty
conservation if the output is not a terminal; ascii as you see
above.

What you want is probably UTF-8 in the output byte stream.  But
let's check what the HTTP headers are saying, because _they_ tell
the browser the byte stream encoding. The headers and your program's
encoding must match. So:

    % wget -S -O - http://superhost.gr/
  --2013-04-23 19:34:38--  http://superhost.gr/
  Resolving superhost.gr (superhost.gr)... 82.211.30.133
  Connecting to superhost.gr (superhost.gr)|82.211.30.133|:80... connected.
  HTTP request sent, awaiting response...
    HTTP/1.1 200 OK
    Date: Tue, 23 Apr 2013 09:34:46 GMT
    Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.0-fips mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
    Keep-Alive: timeout=5, max=100
    Connection: Keep-Alive
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=utf-8
  Length: unspecified [text/html]
  Saving to: ‘STDOUT’

  <!--: spam
  Content-Type: text/html

  <body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> -->
  <body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> --> -->
  </font> </font> </font> </script> </object> </blockquote> </pre>

So, the Content-Type: header says: "text/html; charset=utf-8". So that's good.

So I've imported codecs and added this line:

  sys.stdout = os.fdopen(1, 'w', encoding='utf-8')

under the setting of sys.stderr. If the cgi libraries run under
python 3 there is probably a cleaner way to do this but i don't know how.

This just opens UNIX file descriptor 1 (standard output) from scratch
for write ('w') using the 'utf-8' encoding.

And now your CGI script runs, accepting strings sent to print().
sys.stdout now takes care of transcoding those strings (Unicode
character code points inside Python) into the utf-8 encoding required
in the output bytes.



More information about the Python-list mailing list