[Python-Dev] unicode() and its error argument
Tim Peters
tim.one@comcast.net
Sun, 16 Jun 2002 12:41:23 -0400
[Skip Montanaro]
> ...
> Tim's inability to provoke errors was also suggestive that it was pilot
> error, not a problem with the plane.
Ya, but what do I know about encodings? "Nothing" is right -- that's why I
wrote a program to generate stuff at random.
Taking that another step, to generate the encoding at random too, turns up
at least one way to crash Python: the attached program eventually crashes
when doing a utf7 decode. It appears to be in this line:
if ((ch == '-') || !B64CHAR(ch)) {
and ch "is big" when it blows up. I assume this is because B64CHAR(ch)
expands in part to isalnum(ch), and on Windows the latter is done via array
lookup (and ch is out-of-bounds).
Other failures I've seen out of this are benign, like
>>> unicode('\xf1R\x7f^C\x1e\xd8', 'hex_codec', 'ignore')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "C:\CODE\PYTHON\lib\encodings\hex_codec.py", line 41, in hex_decode
assert errors == 'strict'
AssertionError
>>>
from random import choice, randint
from traceback import print_exc
bytes = [chr(i) for i in range(256)]
paste = ''.join
def generrors(encoding, errors, maxlen, maxtries):
for dummy in xrange(maxtries):
n = randint(1, maxlen)
raw = paste([choice(bytes) for dummy in range(n)])
try:
u = unicode(raw, encoding, errors)
except:
print 'failure in unicode(%r, %r, %r)' % (raw, encoding, errors)
print_exc(0)
return 1
return 0
from encodings.aliases import aliases
unique = aliases.values()
unique = dict(zip(unique, unique)).keys()
while unique:
e = choice(unique)
print
print 'Trying', e
if generrors(e, 'ignore', 10, 1000):
unique.remove(e)