Re: [I18n-sig] Codecs for Big Five and GB 2312
If you are interested, the codec is available at:
http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/iso_2022_7bit.py.gz
I just had a look, and it seems like an interesting package. I'm slightly confused about the installation procedure, though. Installing into python2.0/encodings/{euc_jp,shift_jis,japanese} doesn't look right to me - add-on packages should be capable of installing into site-packages by default. I believe it would actually work if you just install without any arguments to setup.py. euc_jp would then end-up in python2.0/site-packages. Later, when you do u"Hello".encode("euc-jp") it looks for a codec. Here, encodings.__init__.search_function do modname = encoding.replace('-', '_') modname = aliases.aliases.get(modname,modname) try: mod = __import__(modname,globals(),locals(),'*') except ImportError,why: _cache[encoding] = None return None First, encoding becomes euc_jp. With no registered aliases, it would then call __import__ with "euc_jp", which will find the codec in site-packages. In the long run, I'd hope that distutils provides a mean to install additional codecs, e.g via setup( ... codecs = ['japanese'] ...) Then, distutils would collect all these strings, and importing codecs would roughly do for package in distutils.registered_codec_packages: p=__import__(package,global(),locals(),"*") p.register() japanese/__init__.py would provide a register function which registers another search_function, which would load euc_jp and shift_jis on demand. That way, users could install additional codecs which are available to everybody on the system, without having to hack the Python library proper. Regards, Martin
"Martin v. Loewis" wrote:
If you are interested, the codec is available at:
http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/iso_2022_7bit.py.gz
I just had a look, and it seems like an interesting package. I'm slightly confused about the installation procedure, though. Installing into python2.0/encodings/{euc_jp,shift_jis,japanese} doesn't look right to me - add-on packages should be capable of installing into site-packages by default.
I believe it would actually work if you just install without any arguments to setup.py. euc_jp would then end-up in python2.0/site-packages. Later, when you do
u"Hello".encode("euc-jp")
it looks for a codec. Here, encodings.__init__.search_function do
modname = encoding.replace('-', '_') modname = aliases.aliases.get(modname,modname) try: mod = __import__(modname,globals(),locals(),'*') except ImportError,why: _cache[encoding] = None return None
First, encoding becomes euc_jp. With no registered aliases, it would then call __import__ with "euc_jp", which will find the codec in site-packages.
The "right" way to install new codec packages is by placing them inside a package which then registers a new search function in the codec registry. Tamito's other does this AFAIR. To be able to use the codecs, a Python script must then import the codecs package (which then registers the search function). Having to import the package has two benefits: 1. the need for another codec package is visible in the source code 2. registering the search function is delayed until the codec package is first used
In the long run, I'd hope that distutils provides a mean to install additional codecs, e.g via
setup( ... codecs = ['japanese'] ...)
Then, distutils would collect all these strings, and importing codecs would roughly do
for package in distutils.registered_codec_packages: p=__import__(package,global(),locals(),"*") p.register()
japanese/__init__.py would provide a register function which registers another search_function, which would load euc_jp and shift_jis on demand. That way, users could install additional codecs which are available to everybody on the system, without having to hack the Python library proper.
Hmm, not sure here: programs which rely on non-standard codecs should have an explicit "import myCodecs" at the top of the file. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Having to import the package has two benefits: 1. the need for another codec package is visible in the source code
I don't think this is a benefit for a typical application that uses multiple codecs. More often than not, the application will learn about a required encoding by means of an application-level protocol (e.g. a Content-Type in a MIME header). It doesn't really *require* any encoding; instead, it needs the codecs of any data it happens to process in a certain session. The application designer is normally not interested in a specific encoding; she expects Python to do the right thing whenever .encode is invoked.
2. registering the search function is delayed until the codec package is first used
That is hardly a benefit: registering the search function is not an expensive operation, and the typical application would start with try: import japanese except ImportError: pass try: import windows_codepages except ImportError: pass try: import iana except ImportError: pass try: import OSFCharmaps except ImportError: pass anyway, so all codecs it may need are registered right from the start. Regards, Martin
"Martin v. Loewis" wrote:
Having to import the package has two benefits: 1. the need for another codec package is visible in the source code
I don't think this is a benefit for a typical application that uses multiple codecs. More often than not, the application will learn about a required encoding by means of an application-level protocol (e.g. a Content-Type in a MIME header). It doesn't really *require* any encoding; instead, it needs the codecs of any data it happens to process in a certain session. The application designer is normally not interested in a specific encoding; she expects Python to do the right thing whenever .encode is invoked.
But the requirement for a non-standard codec package is made visible this way and that's what I was referring to. An application which relies on availability of Japanese codecs will produce an ImportError in case these are not installed.
2. registering the search function is delayed until the codec package is first used
That is hardly a benefit: registering the search function is not an expensive operation, and the typical application would start with
try: import japanese except ImportError: pass try: import windows_codepages except ImportError: pass try: import iana except ImportError: pass try: import OSFCharmaps except ImportError: pass
anyway, so all codecs it may need are registered right from the start.
No. You wouldn't hide these ImportErrors if you rely on the packages being installed. If the application doesn't care for the specific encodings being installed, then the administrator could add these imports to the sitecustomize.py module after installing the codec packages. I don't think that doing this automatically is a good idea. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (2)
-
M.-A. Lemburg
-
Martin v. Loewis