Access violation in Python shell
David Hughes
dfh at forestfield.co.uk
Sat Oct 19 02:22:42 EDT 2002
I wrote the following piece of code to try and clarify for myself what
happens when Python coerces byte strings into unicode if there are bytes >
127 present and the default ascii encoding is set.
#---uni01.py---
# test the combinations of unicode with 8 bit bytes ie. not ascii
def cencode(s):
"""" Conditionally encode 's' if unicode else do nothing.
Needed to prevent error if attempting to encode a
byte string containing codes > 128
"""
import types
if isinstance(s, types.UnicodeType):
return s.encode('latin-1')
else:
return s
ch = [ 'a', '\xe2', u'a', u'\xe2' ]
word = [ 'gateau', 'g\xe2teau', u'gateau', u'g\xe2teau']
for k in range(3):
print '\nIteration', k+1
for w in word:
for c in ch:
try:
if c in w:
print 'yes', cencode(c), type(c), cencode(w), type(w)
else:
print 'no', cencode(c), type(c), cencode(w), type(w)
except UnicodeError, e:
print 'Unicode error.', e, \
cencode(c), type(c), cencode(w), type(w)
except TypeError, e:
print 'Type error.', e, \
cencode(c), type(c), cencode(w), type(w)
print
---End---
It ran as expected on the first iteration and the exceptions make sense
after thinking about them (Although why some come up as TypeErrors, I don't
know). But during the second iteration the value of ch[2] changes somehow -
it is ok at the end of the first iteration. On attempting to re-run the
code, the Python shell terminates. This, or something like it was originally
happening with Python 2.2 so I upgraded to 2.2.2
--Output---
E:\Pydevsrc\Test\unicode>python
Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> execfile('uni01.py')
Iteration 1
yes a <type 'str'> gateau <type 'str'>
no G <type 'str'> gateau <type 'str'>
yes a <type 'unicode'> gateau <type 'str'>
no G <type 'unicode'> gateau <type 'str'>
yes a <type 'str'> gGteau <type 'str'>
yes G <type 'str'> gGteau <type 'str'>
Unicode error. ASCII decoding error: ordinal not in range(128) a <type
'unicode'> gGteau <type 'str'>
Unicode error. ASCII decoding error: ordinal not in range(128) G <type
'unicode'> gGteau <type 'str'>
yes a <type 'str'> gateau <type 'unicode'>
Type error. 'in <string>' requires character as left operand G <type 'str'>
gateau <type 'unicode'>
yes a <type 'unicode'> gateau <type 'unicode'>
no G <type 'unicode'> gateau <type 'unicode'>
yes a <type 'str'> gGteau <type 'unicode'>
Type error. 'in <string>' requires character as left operand G <type 'str'>
gGteau <type 'unicode'>
yes a <type 'unicode'> gGteau <type 'unicode'>
yes G <type 'unicode'> gGteau <type 'unicode'>
Iteration 2
yes a <type 'str'> gateau <type 'str'>
no G <type 'str'> gateau <type 'str'>
yes a <type 'unicode'> gateau <type 'str'>
no G <type 'unicode'> gateau <type 'str'>
yes a <type 'str'> gGteau <type 'str'>
yes G <type 'str'> gGteau <type 'str'>
Unicode error. ASCII decoding error: ordinal not in range(128) a <type
'unicode'> gGteau <type 'str'>
Unicode error. ASCII decoding error: ordinal not in range(128) G <type
'unicode'> gGteau <type 'str'>
yes a <type 'str'> gateau <type 'unicode'>
Type error. 'in <string>' requires character as left operand G <type 'str'>
gateau <type 'unicode'>
Type error. 'in <string>' requires character as left operand
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "uni01.py", line 30, in ?
print 'Type error.', e, cencode(c), type(c), cencode(w), type(w)
File "uni01.py", line 10, in cencode
return s.encode('latin-1')
UnicodeError: Latin-1 encoding error: ordinal not in range(256)
>>> ch
['a', '\xe2', u'g\x00\u0178t\x00\x00', u'\xe2']
>>> word
['gateau', 'g\xe2teau', u'gateau', u'g\xe2teau']
>>> execfile('uni01.py')
[ Python shell crashed here ]
---------------------------------
The 8 bit character \xe2 was originally an a-circumflex. It rendered like a
greek Tau in the Python output but ends up as a 'G' in the above copy.
Can anyone reproduce or shed any light on this problem, please, or am I
making a public demonstration of stupidity here?
Regards,
David Hughes
More information about the Python-list
mailing list