Unicode in cgi-script with apache2
Dominique Ramaekers
dominique at ramaekers-stassart.be
Sat Aug 16 18:49:47 EDT 2014
Hi Peter,
Your code seems interesting.
I've tried using sys.stdout (in a slightly different form) but it gave
the same error.
I also read about people who fixed the error by changing the servers
locale to en_US.UTF-8. The people who posted these fixes also said that
you can only use en_US.UTF-8 (and not ex. nl_BE.UTF8)... Anyway, It
didn't work for me. And I find this a dirty fix because, I don't want to
use US locale...
Please excuse me not to try out your specific solutions. I've already
started to implement WSGI over CGI. See my previous message...
grz
Op 16-08-14 om 13:17 schreef Peter Otten:
> Dominique Ramaekers wrote:
>
>> I've got a little script:
>>
>> #!/usr/bin/env python3
>> print("Content-Type: text/html")
>> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
>> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
>> print("")
>> f = open("/var/www/cgi-data/index.html", "r")
>> for line in f:
>> print(line,end='')
>>
>> If I run the script in the terminal, it nicely prints the webpage
>> 'index.html'.
>>
>> If access the script through a webbrowser, apache gives an error:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 1791: ordinal not in range(128)
>>
>> I've done a hole afternoon of reading on fora and blogs, I don't have a
>> solution.
>>
>> Can anyone help me?
> If the input and output encoding are the same you can avoid the byte-to-text
> (and subsequent text-to-byte conversion) and serve the binary contents of
> the index.html file directly:
>
> #!/usr/bin/env python3
> import sys
>
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> sys.stdout.flush()
> with open("/var/www/cgi-data/index.html", "rb") as f:
> for line in f:
> sys.stdout.buffer.write(line)
>
> The flush() is necessary to write pending data before accessing the lowlevel
> stdout.buffer. Instead of the loop you can use any of these:
>
> sys.stdout.buffer.write(f.read()) # not for huge files, but should be OK for
> # typical html file sizes
> sys.stdout.buffer.writelines(f)
> shutil.copyfileobj(f, sys.stdout.buffer) # show off your knowledge
> # of the stdlib ;)
>
>
> Alternatively you could choose an encoding via the locale:
>
> #!/usr/bin/env python3
> import locale
> locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
>
> print("Content-Type: text/html")
> print("Cache-Control: no-cache, must-revalidate") # HTTP/1.1
> print("Expires: Sat, 26 Jul 1997 05:00:00 GMT") # Date in the past
> print("")
> with open("/var/www/cgi-data/index.html") as f:
> for line in f:
> print(line, end='')
>
> Python should then use UTF-8 as the default for i/o and the resulting
> scripts looks more familiar.
>
More information about the Python-list
mailing list