Programmatically discovering encoding types supported by codecs module

python at bdurham.com python at bdurham.com
Sun Mar 28 12:48:52 CEST 2010


Gabriel,

Thank you for your analysis - very interesting. Enjoyed your fromlist
choice of names. I'm still in my honeymoon phase with Python so I only
know the first part :)

Regards,
Malcolm


----- Original message -----
From: "Gabriel Genellina" <gagsl-py2 at yahoo.com.ar>
To: python-list at python.org
Date: Wed, 24 Mar 2010 19:50:11 -0300
Subject: Re: Programmatically discovering encoding types supported by
codecs    module

En Wed, 24 Mar 2010 14:58:47 -0300, <python at bdurham.com> escribió:

>> After looking at how things are done in codecs.c and  
>> encodings/__init__.py  I think you should enumerate all modules in the  
>> encodings package that define a getregentry function. Aliases come from  
>> encodings.aliases.aliases.
>
> Thanks for looking into this for me. Benjamin Kaplan made a similar
> observation. My reply to him included the snippet of code we're using to
> generate the actual list of encodings that our software will support
> (thanks to Python's codecs and encodings modules).

I was curious as whether both methods would give the same results:

py> modules=set()
py> for name in glob.glob(os.path.join(encodings.__path__[0], "*.py")):
...   name = os.path.basename(name)[:-3]
...   try: mod = __import__("encodings."+name,  
fromlist=['ilovepythonbutsometimesihateit'])
...   except ImportError: continue
...   if hasattr(mod, 'getregentry'):
...     modules.add(name)
...
py> fromalias = set(encodings.aliases.aliases.values())
py> fromalias - modules
set(['tactis'])
py> modules - fromalias
set(['charmap',
      'cp1006',
      'cp737',
      'cp856',
      'cp874',
      'cp875',
      'idna',
      'iso8859_1',
      'koi8_u',
      'mac_arabic',
      'mac_centeuro',
      'mac_croatian',
      'mac_farsi',
      'mac_romanian',
      'palmos',
      'punycode',
      'raw_unicode_escape',
      'string_escape',
      'undefined',
      'unicode_escape',
      'unicode_internal',
      'utf_8_sig'])

There is a missing 'tactis' encoding (?) and about twenty without alias.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list