How to support a non-standard encoding?
wxjmfauth at gmail.com
Fri Jan 6 15:00:00 EST 2012
On 6 jan, 11:03, Ivan <i... at llaisdy.com> wrote:
> Dear All
> I'm developing a python application for which I need to support a
> non-standard character encoding (specifically ISO 6937/2-1983, Addendum
> 1-1989). Here are some of the properties of the encoding and its use in
> the application:
> - I need to read and write data to/from files. The file format
> includes two sections in different character encodings (so I
> shan't be able to use codecs.open()).
> - iso-6937 sections include non-printing control characters
> - iso-6937 is a variable width encoding, e.g. "A" = ,
> "Ä" = [0xC8, 0x41]; all non-spacing diacritical marks are in the
> range 0xC0-0xCF.
> By any chance is there anyone out there working on iso-6937?
> Otherwise, I think I need to write a new codec to support reading and
> writing this data. Does anyone know of any tutorials or blog posts on
> implementing a codec for a non-standard characeter encoding? Would
> anyone be interested in reading one?
Take a look at the files, Python modules, in the
...\Lib\encodings. This is the place where all codecs
are centralized. Python is magically using these
a long there are present in that dir.
I remember, long time ago, for the fun, I created such
a codec quite easily. I picked up one of the file as
template and I modified its "table". It was a
byte <-> byte table.
For multibytes coding scheme, it may be a litte bit more
complicated; you may take a look, eg, at the mbcs.py codec.
The distibution of such a codec may be a problem.
Another simple approach, os independent.
You probably do not write your code in iso-6937, but
you only need to encode/decode some bytes sequence
"on the fly". In that case, work with bytes, create
a couple of coding / decoding functions with a
created <dict> [*] as helper. It's not so complicate.
Use <unicode> Py2 or <str> Py3 (the recommended
way ;-) ) as pivot encoding.
[*] I also created once a such a dict from
I never checked if it does correpond to the "official" cp1252
More information about the Python-list