[ python-Bugs-960874 ] codecs.lookup can raise exceptions other than LookupError

Wed May 26 14:53:45 EDT 2004

Bugs item #960874, was opened at 2004-05-26 15:37
Message generated for change (Comment added) made by mwh
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=960874&group_id=5470

Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: John Ehresman (jpe)
Assigned to: M.-A. Lemburg (lemburg)
Summary: codecs.lookup can raise exceptions other than LookupError

Initial Comment:
codecs.lookup raises ValueError when given an empty 
string and UnicodeEncodeError when given a unicode 
object that can't be converted to a str in the default 
encoding.  I'd expect it to raise LookupError when 
passed any basestring instance.

For example:
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC 
v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import codecs
>>> codecs.lookup('')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "c:\python23\lib\encodings\__init__.py", line 84, in 
search_function
    globals(), locals(), _import_tail)
ValueError: Empty module name
>>> codecs.lookup(u'\uabcd')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\uabcd' in position 0: ordinal not in range
(128)
>>>

----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2004-05-26 19:53

Message:
Logged In: YES 
user_id=6656

Well, *I* don't think that's a particularly good idea.  I don't know if 
Marc-André feels differently.

----------------------------------------------------------------------

Comment By: John Ehresman (jpe)
Date: 2004-05-26 19:47

Message:
Logged In: YES 
user_id=22785

Yes, it does look like lookup('') is fixed in CVS.  So the 
question is whether lookup() of something that isn't 
convertable in the current encoding to a char* should raise a 
LookupError.  I can live with it not, though if it did, it would 
make it a bit easier to determine if an arbitrary unicode string 
is a name of a supported encoding.  

I'm willing to put together a patch to raise LookupError if 
that's what the behavior should be

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-05-26 18:13

Message:
Logged In: YES 
user_id=6656

This much seems to be fixed in CVS, actually :-)

----------------------------------------------------------------------

Comment By: John Ehresman (jpe)
Date: 2004-05-26 18:09

Message:
Logged In: YES 
user_id=22785

The other exceptions occur when strings or unicode objects 
are passed in as an argument.  The string that it fails on is 
the empty string ('').  I can see disallowing non-ascii names, 
but '' should raise a LookupError.

My use case is to see if an user supplied unicode string is a 
valid encoding, so any check that the lookup function does 
not do, I will need to do before calling it.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-05-26 17:32

Message:
Logged In: YES 
user_id=6656

What exactly are you complaining about?  I'd expect codecs.lookup 
to raise TypeError if called with no arguments or an integer.

I believe it's documented somewhere that encoding names must 
be ascii only, but I must admit I don't recall where.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=960874&group_id=5470