[I18n-sig] naming codecs

M.-A. Lemburg mal@lemburg.com
Wed, 06 Dec 2000 13:16:19 +0100

Tamito KAJIYAMA wrote:
> Thank you for the quick reply.
> M.-A. Lemburg wrote:
> |
> | > I consider releasing a version of the JapaneseCodecs package
> | > that will include a new codec for a variant of ISO-2022-JP.  The
> | > codec is almost the same as the ISO-2022-JP codec, but it can
> | > encode and decode Halfwidth Katakana (U+FF61 to U+FF9F) which
> | > can not be encoded with ISO-2022-JP as defined in RFC1468.
> | >
> | > I believe there is a demand for the codec, but I have no idea
> | > on the name of the codec.  I'd like to give it a name that is
> | > different from all standard encoding names, since the encoding
> | > for which the codec works is not defined as a standard
> | > (e.g. RFCs).  I'd also like to avoid an encoding name that is
> | > likely to be used as a standard encoding name in the future.
> | >
> | > Does anyone have a good name for the codec?  Or, how may I think
> | > about the naming of a codec?  Any suggestions are welcome.
> |
> | Why not simply append another "-<variant>" part to the name,
> | e.g. "iso-2022-jp-hw" or "iso-2022-jp-extended" ?
> I like "iso-2022-jp-extended", but I wonder if this naming
> convention may be used.  There are the standard encoding names
> ISO-2022-JP-1 and ISO-2022-JP-2 in addition to ISO-2022-JP, and
> also there are ISO-2022-CN and ISO-2022-CN-EXT.  So, a simple
> "-variant" part is likely to conflict with a standard encoding
> name in the future.  However, an abbreviated and/or tricky
> "-variant" part such as "-hw" is not user-friendly.

Hmm, I don't think there's anything user friendly about
'iso-2022-jp' either... User friendly would be 'japanese'
and then have the codec registry figure out what the user
means with this by applying some voodoo magic ;-)
Seriously, I think the codec name should include at least
a hint as to what it does -- so perhaps '-halfwidth-katakana'
would be more appropriate. You can always provide shorter
aliases either by means of providing more than one codec
.py file for the same codec or by registering a codec
search function which implements the aliasing.

BTW, what happened to the idea of using package names for
optional codecs ? E.g. you can install the codecs in a package
'mycodecs' and then reference them using 'mycodecs.iso-1234' without
having to register a codec search function at startup.

This would not only clarify the origin of the codec, but also
allow using different codec implementations for the same encoding
(e.g. one in Python and another in C).

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/