Encoding questions

Irmen de Jong irmen at -NOSPAM-REMOVETHIS-xs4all.nl
Wed Jul 2 15:46:05 EDT 2003


Marc wrote:

> Is there any way to see what encodings are installed on my computer,
> or a reference of the encodings distributed with Python and what each
> of them is for? I've looked at the reference for the codec module and
> imagine there's a folder full of Codec classes somewhere; is this how
> it works?

Yep, look in <python-install-dir>/Lib/encodings

I am not aware of a 'clean' way to get all available codecs from
within your python code.

> The specific problem I have is a Unicode file that contains a bullet
> character (Unicode BULLET, 2022). Ultimately, it's going to be written
> into a PDF file (via ReportLab). The default ASCII encoding doesn't
> work, which is OK; suprisingly neither does 'latin-1', yet 'cp1252'
> does work. These are just encodings I've seen somewhere. Basically,
> I'd like to know how to go about choosing the best/most appropriate
> encoding for a given application.

ASCII, latin-1 and cp1252 are just examples of *subsets* of the full
Unicode character range. An encoding that you might find useful is
UTF-8 (http://www.cl.cam.ac.uk/~mgk25/unicode.html).
It is widely supported and can contain any Unicode character.

If you're processing unicode texts inside your app, stick with
Python's built-in Unicode string type.

--Irmen





More information about the Python-list mailing list