[ python-Bugs-1446043 ] unicode('foo', '.utf99') does not raise LookupError

Thu Mar 9 16:04:14 CET 2006

Bugs item #1446043, was opened at 2006-03-08 19:55
Message generated for change (Comment added) made by osvenskan
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1446043&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: osvenskan (osvenskan)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode('foo', '.utf99') does not raise LookupError 

Initial Comment:
A very minor inconsistency -- when I call unicode()
with an encoding that Python doesn't know about, it
usually returns a lookup error (e.g LookupError:
unknown encoding: utf99). But when the encoding begins
with a dot (ASCII 0x2e), Python instead gives a
ValueError: Empty module name. It is certainly correct
in raising an error, but it should raise a lookup
error, not a value error.

I've recreated this under Python 2.4.1/FreeBSD 6.0 and
2.3/OS X. See attachment for recreation steps.

----------------------------------------------------------------------

>Comment By: osvenskan (osvenskan)
Date: 2006-03-09 10:04

Message:
Logged In: YES 
user_id=1119995

There are encoding names that contain dots, such as
ANSI_X3.4-1968, ANSI_X3.4-1986 and ISO_646.IRV:1991 (as
reported by iconv). There are none in iconv's list that
begin with a dot. 

Please note that the behavior of this function has been
discussed before in Python bugs 513666 and 960874. Apologies
for not referencing them in my original report. 

Having stepped through the code, I understand how the
ValueError is getting generated. My frustration with this as
a programmer is that I want to write specific except clauses
for each possible exception that a method can raise, but
that's impractical if any exception is fair game on any
method. So I'm forced to use a catch-all except clause about
which the Python documentation says (wisely, IMHO), "Use
this with extreme caution, since it is easy to mask a real
programming error in this way!" While it is helpful to
document errors that a method is *likely* to raise, my code
needs to handle all possibilities, not just likely ones.

Perhaps the answer is just, "This is how Python works" and
if I feel it is a weakness in the language I need to take it
up on a different level. 

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-03-09 03:16

Message:
Logged In: YES 
user_id=849994

Is it possible for an encoding name to contain dots at all?

If not, this would do too:
if '.' in modname: continue

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-03-09 03:12

Message:
Logged In: YES 
user_id=89016

The problem is that after normalizing the encoding name a
module with this name is imported. Maybe
encodings/__init__.py:search_function should do:

if ".".join(filter(None, modname.split("."))) != modname:
   return None

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1446043&group_id=5470