file.write() of non-ASCII characters differs in Interpreted Python than in script run
RAH
rene.heymans at gmail.com
Tue Aug 25 17:19:53 EDT 2015
Dear All,
I experienced an incomprehensible behavior (I've spent already many hours on this subject): the `file.write('string')` provides an error in run mode and not when interpreted at the console. The string must contain non-ASCII characters. If all ASCII, there is no error.
The following example shows what I can see. I must overlook something because I cannot think Python makes a difference between interpreted and run modes and yet ... Can someone please check that subject.
Thank you in advance.
René
Code extract from WSGI application (reply.py)
=============================================
request_body = environ['wsgi.input'].read(request_body_size) # bytes
rb = request_body.decode() # string
d = parse_qs(rb) # dict
f = open('logbytes', 'ab')
g = open('logstr', 'a')
h = open('logdict', 'a')
f.write(request_body)
g.write(str(type(request_body)) + '\t' + str(type(rb)) + '\t' + str(type(d)) + '\n')
h.write(str(d) + '\n') <--- line 28 of the application
h.close()
g.close()
f.close()
Tail of Apache2 error.log
=========================
[Tue Aug 25 20:24:04.657933 2015] [wsgi:error] [pid 3677:tid 3029764928] [remote 192.168.1.5:27575] File "reply.py", line 28, in application
[Tue Aug 25 20:24:04.658001 2015] [wsgi:error] [pid 3677:tid 3029764928] [remote 192.168.1.5:27575] h.write(str(d) + '\\n')
[Tue Aug 25 20:24:04.658201 2015] [wsgi:error] [pid 3677:tid 3029764928] [remote 192.168.1.5:27575] UnicodeEncodeError: 'ascii' codec can't encode character '\\xc7' in position 15: ordinal not in range(128)
Checking what has been logged
=============================
rse at Alibaba:~/test$ cat logbytes
userName=Ça va ! <--- this was indeed the input (notice the
french C + cedilla)
Unicode U+00C7 ALT-0199 UTF-8 C387
Reading the logbytes file one can verify
that Ç is indeed represented by the 2 bytes
\xC3 and \x87
rse at Alibaba:~/test$ cat logstr
<class 'bytes'> <class 'str'> <class 'dict'>
rse at Alibaba:~/test$ cat logdict
rse at Alibaba:~/test$ <--- Obviously empty because of error
Trying similar code within the Python interpreter
=================================================
rse at Alibaba:~/test$ python
Python 3.4.0 (default, Jun 19 2015, 14:18:46)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> di = {'userName': ['Ça va !']} <--- A dictionary
>>> str(di)
"{'userName': ['Ça va !']}" <--- and its string representation
>>> type(str(di))
<class 'str'> <--- Is a string indeed
>>> fi = open('essai', 'a')
>>> fi.write(str(di) + '\n')
26 <--- It works well
>>> fi.close()
>>>
Checking what has been written
==============================
rse at Alibaba:~/test$ cat essai
{'userName': ['Ça va !']} <--- The result is correct
rse at Alibaba:~/test$
No error if all ASCII
=====================
If the input is `userName=Rene` for instance then there is no error and the
`logdict' does indeed then contain the text of the dictionary
`{'userName': ['Rene']}`
More information about the Python-list
mailing list