[ python-Bugs-1446043 ] unicode('foo', '.utf99') does not raise LookupError

SourceForge.net noreply at sourceforge.net
Thu Aug 17 21:17:59 CEST 2006


Bugs item #1446043, was opened at 2006-03-09 00:55
Message generated for change (Settings changed) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1446043&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
>Priority: 8
Submitted By: osvenskan (osvenskan)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: unicode('foo', '.utf99') does not raise LookupError 

Initial Comment:
A very minor inconsistency -- when I call unicode()
with an encoding that Python doesn't know about, it
usually returns a lookup error (e.g LookupError:
unknown encoding: utf99). But when the encoding begins
with a dot (ASCII 0x2e), Python instead gives a
ValueError: Empty module name. It is certainly correct
in raising an error, but it should raise a lookup
error, not a value error.

I've recreated this under Python 2.4.1/FreeBSD 6.0 and
2.3/OS X. See attachment for recreation steps.



----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2006-08-17 19:17

Message:
Logged In: YES 
user_id=849994

I'd say that this should be fixed before 2.5 final.

Attached patch (the modname that's used for import may not
contain a dot anymore...)

----------------------------------------------------------------------

Comment By: osvenskan (osvenskan)
Date: 2006-04-06 14:45

Message:
Logged In: YES 
user_id=1119995

I noticed that the documentation for unicode() says, "if the
encoding is not known, LookupError is raised". Regarding the
3rd parameter ("errors") to unicode(), the docs say, "Error
handling is done according to errors; this specifies the
treatment of characters which are invalid in the input
encoding. If errors is 'strict' (the default), a ValueError
is raised on errors..."
ref: http://docs.python.org/lib/built-in-funcs.html

That makes the code's current behavior doubly confusing
because a the documentation says that a ValueError is
reserved for indicating an undecodable byte sequence, not an
unknown encoding name.


----------------------------------------------------------------------

Comment By: osvenskan (osvenskan)
Date: 2006-03-09 15:04

Message:
Logged In: YES 
user_id=1119995

There are encoding names that contain dots, such as
ANSI_X3.4-1968, ANSI_X3.4-1986 and ISO_646.IRV:1991 (as
reported by iconv). There are none in iconv's list that
begin with a dot. 

Please note that the behavior of this function has been
discussed before in Python bugs 513666 and 960874. Apologies
for not referencing them in my original report. 

Having stepped through the code, I understand how the
ValueError is getting generated. My frustration with this as
a programmer is that I want to write specific except clauses
for each possible exception that a method can raise, but
that's impractical if any exception is fair game on any
method. So I'm forced to use a catch-all except clause about
which the Python documentation says (wisely, IMHO), "Use
this with extreme caution, since it is easy to mask a real
programming error in this way!" While it is helpful to
document errors that a method is *likely* to raise, my code
needs to handle all possibilities, not just likely ones.

Perhaps the answer is just, "This is how Python works" and
if I feel it is a weakness in the language I need to take it
up on a different level. 

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-03-09 08:16

Message:
Logged In: YES 
user_id=849994

Is it possible for an encoding name to contain dots at all?

If not, this would do too:
if '.' in modname: continue

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-03-09 08:12

Message:
Logged In: YES 
user_id=89016

The problem is that after normalizing the encoding name a
module with this name is imported. Maybe
encodings/__init__.py:search_function should do:

if ".".join(filter(None, modname.split("."))) != modname:
   return None


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1446043&group_id=5470


More information about the Python-bugs-list mailing list