Programmatically discovering encoding types supported by codecs module

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Mar 24 18:50:11 EDT 2010


En Wed, 24 Mar 2010 14:58:47 -0300, <python at bdurham.com> escribió:

>> After looking at how things are done in codecs.c and  
>> encodings/__init__.py  I think you should enumerate all modules in the  
>> encodings package that define a getregentry function. Aliases come from  
>> encodings.aliases.aliases.
>
> Thanks for looking into this for me. Benjamin Kaplan made a similar
> observation. My reply to him included the snippet of code we're using to
> generate the actual list of encodings that our software will support
> (thanks to Python's codecs and encodings modules).

I was curious as whether both methods would give the same results:

py> modules=set()
py> for name in glob.glob(os.path.join(encodings.__path__[0], "*.py")):
...   name = os.path.basename(name)[:-3]
...   try: mod = __import__("encodings."+name,  
fromlist=['ilovepythonbutsometimesihateit'])
...   except ImportError: continue
...   if hasattr(mod, 'getregentry'):
...     modules.add(name)
...
py> fromalias = set(encodings.aliases.aliases.values())
py> fromalias - modules
set(['tactis'])
py> modules - fromalias
set(['charmap',
      'cp1006',
      'cp737',
      'cp856',
      'cp874',
      'cp875',
      'idna',
      'iso8859_1',
      'koi8_u',
      'mac_arabic',
      'mac_centeuro',
      'mac_croatian',
      'mac_farsi',
      'mac_romanian',
      'palmos',
      'punycode',
      'raw_unicode_escape',
      'string_escape',
      'undefined',
      'unicode_escape',
      'unicode_internal',
      'utf_8_sig'])

There is a missing 'tactis' encoding (?) and about twenty without alias.

-- 
Gabriel Genellina




More information about the Python-list mailing list