[docs] [issue23232] 'codecs' module functionality + its docs -- concerning custom codecs, especially non-string ones

Jan Kaliszewski report at bugs.python.org
Tue Jan 13 16:04:27 CET 2015

New submission from Jan Kaliszewski:

To some extent, this issue is a follow-up of Issue 20132. It concerns some parts of functionality + documentation of the 'codecs' module related to registering custom codecs, especially non-string ones (i.e., codecs that encode/decode between arbitrary types, not necessarily the str and bytes types).

A few fragments of documented behaviour and/or documentation itself bother me:

0. Ad "7.2.1. Codec Base Classes"

"Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer. The stream reader and writers typically reuse the stateless encoder/decoder to implement the file protocols. Codec authors also need to define how the codec will handle encoding and decoding errors."

IMHO it is still unclear:

a) what is the relation between codecs in this meaning and CodecInfo objects? (especially: CodecInfo contains information about six interfaces, not four)

b) How codec authors define "how the codec will handle encoding and decoding errors"? What is relation between this and error handling schemes (defined as generic, not per-codec ones) documented below? 

1. Ad " Error Handlers" and "codecs.strict_errors(exception)"

"'strict' 	Raise UnicodeError (or a subclass); this is the default. Implemented in strict_errors()."

Implements the 'strict' error handling: each encoding or decoding error raises a UnicodeError."

Is it true that always it is a UnicodeError or its subclass and not just ValueError or its subclass? (as it is described in other fragments of the module documentation).

Please note, that 'strict' is documented as a universal (and not e.g. text-encoding-only) error handling scheme. So, what about non-string codecs?

2. Ad "codecs.register_error(name, error_handler)"

"For encoding, error_handler will be called with a UnicodeEncodeError instance..." "Decoding and translating works similarly, except UnicodeDecodeError or UnicodeTranslateError will be passed..."

Again: what about non-string codecs? UnicodeError subclasses do not seem to be appropriate for them.

3. It would be nice to address the Zoinkity's concerns from the Issue 20132 (partially related to the above points):

One glaring omission is any information about multibyte codecs--the class, its methods, and how to even define one.  

Also, the primary use for codecs.register would be to append a single codec to the lookup registry.  Simple usage of the method only provides lookup for the provided codecs and will not include regularly-accessible ones such as "utf-8".  It would be enormously helpful to provide an example of proper, safe usage.

assignee: docs at python
components: Documentation, Library (Lib)
messages: 233940
nosy: docs at python, zuo
priority: normal
severity: normal
status: open
title: 'codecs' module functionality + its docs -- concerning custom codecs, especially non-string ones
versions: Python 3.4, Python 3.5

Python tracker <report at bugs.python.org>

More information about the docs mailing list