[Python-3000] PEP 3108: Standard Library Reorganization

Barry Warsaw barry at python.org
Wed Jan 3 05:37:49 CET 2007

Hash: SHA1

On Jan 2, 2007, at 10:53 PM, Anthony Baxter wrote:

> Additionally, base32 and base16 are not supported by codecs,
> according to the docs, and neither is the ability to specify
> alternate character mappings (I don't know how heavily used the
> last is, though).

Which reminds me of another problem I have with the codecs module.   
codecs are a potential security issue because they are a backdoor way  
to get modules imported.  For example, if I get an email with a  
specified charset, the natural thing is to want to use .decode() to  
turn that into a unicode.  The problem is that Python can be  
essentially tricked into importing any module that way.  Try this:

Python 2.4.3 (#1, Jun 12 2006, 19:42:21)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.modules['smtplib']
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
KeyError: 'smtplib'
 >>> 'foo'.decode('smtplib')
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
LookupError: unknown encoding: smtplib
 >>> sys.modules['smtplib']
<module 'smtplib' from '/opt/local/Library/Frameworks/ 

Okay, so this doesn't open any more holes than are already opened,  
but I worry that the style of encouraging codec use will make such  
potential holes more widespread.  The problem is that it's very  
difficult to code defensively around this because you can't really  
whitelist or blacklist the set of valid codecs, and you can't  
feasibly audit every importable module to see if it has nasty import  
side effects.

It would help if codec lookup wasn't import based, or somehow any  
import side-effects could be isolated.

- -Barry

Version: GnuPG v1.4.5 (Darwin)


More information about the Python-3000 mailing list