Programmatically discovering encoding types supported by codecs module
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Wed Mar 24 18:50:11 EDT 2010
En Wed, 24 Mar 2010 14:58:47 -0300, <python at bdurham.com> escribió:
>> After looking at how things are done in codecs.c and
>> encodings/__init__.py I think you should enumerate all modules in the
>> encodings package that define a getregentry function. Aliases come from
>> encodings.aliases.aliases.
>
> Thanks for looking into this for me. Benjamin Kaplan made a similar
> observation. My reply to him included the snippet of code we're using to
> generate the actual list of encodings that our software will support
> (thanks to Python's codecs and encodings modules).
I was curious as whether both methods would give the same results:
py> modules=set()
py> for name in glob.glob(os.path.join(encodings.__path__[0], "*.py")):
... name = os.path.basename(name)[:-3]
... try: mod = __import__("encodings."+name,
fromlist=['ilovepythonbutsometimesihateit'])
... except ImportError: continue
... if hasattr(mod, 'getregentry'):
... modules.add(name)
...
py> fromalias = set(encodings.aliases.aliases.values())
py> fromalias - modules
set(['tactis'])
py> modules - fromalias
set(['charmap',
'cp1006',
'cp737',
'cp856',
'cp874',
'cp875',
'idna',
'iso8859_1',
'koi8_u',
'mac_arabic',
'mac_centeuro',
'mac_croatian',
'mac_farsi',
'mac_romanian',
'palmos',
'punycode',
'raw_unicode_escape',
'string_escape',
'undefined',
'unicode_escape',
'unicode_internal',
'utf_8_sig'])
There is a missing 'tactis' encoding (?) and about twenty without alias.
--
Gabriel Genellina
More information about the Python-list
mailing list