Character encodings and codecs
aleax at aleax.it
Sat Feb 1 21:15:45 CET 2003
> So I would have to read it in byte by byte and manuall check when I
> can make a break. There is now Python module that would make this
> easier. I thought thats waht the codec module does but I can't relly
> under stand it.
> The specific projoct I'm working on now would require readine EUC-JP,
> storing characters internally as Unicode, and writing UTF-8.
codec does exactly this, as long as you have an EUC-JP codec
installed of course -- just use codec.open to open your files,
specifying each file encoding -- data in memory is Unicode,
and you can read by line line for example (w. method readline).
I think has Japanese codecs available for download, but I'm
not sure because the instructions are in (I believe) Japanese,
and I cannot read that language.
More information about the Python-list